[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161551#comment-14161551 ] Hadoop QA commented on YARN-1879: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673293/YARN-1879.23.patch against trunk revision 0fb2735. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5301//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5301//console This message is automatically generated. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM > fail over > > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, > YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, > YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, > YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161550#comment-14161550 ] Hadoop QA commented on YARN-2496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673291/YARN-2496.patch against trunk revision 0fb2735. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5300//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5300//console This message is automatically generated. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161547#comment-14161547 ] Zhijie Shen commented on YARN-2583: --- 1. Be more specific: we find a more scalable method to only write a single log file per LRS? {code} // we find a more scalable method. {code} 2. Make 30 and 3600 the constants of AppLogAggregatorImpl? {code} int configuredRentionSize = conf.getInt(NM_LOG_AGGREGATION_RETAIN_RETENTION_SIZE_PER_APP, 30); {code} {code} if (configuredInterval > 0 && configuredInterval < 3600) { {code} 3. Should be ">"? {code} if (status.size() >= this.retentionSize) { {code} And should be "<"? {code} for (int i = 0 ; i <= statusList.size() - this.retentionSize; i++) { {code} 4. why not using yarnclient? The packaging issue? {code} private ApplicationClientProtocol rmClient; {code} > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch, > YARN-2583.3.1.patch, YARN-2583.3.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1879: - Attachment: YARN-1879.23.patch Marked Idempotent annotations to registerApplicationMaster and finishApplicationMaster. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM > fail over > > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, > YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, > YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, > YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161499#comment-14161499 ] Tsuyoshi OZAWA commented on YARN-1879: -- Thanks for your comments, Jian and Karthik. {quote} from RM’s perspective, these are just new requests, as the new RM doesn’t have any cache for previous requests from client. {quote} I confirmed that it's true. Not only {{finishApplicationMaster}} but also {{registerApplicationMaster}} don't touch the data in ZK directly, so RM can handle retried requests transparently following cases: 1. When EmbeddedElector choose different RM as a leader before and after the failover, ZK doesn't have the data of RMAttempt/RMApp. Then, RM recognizes a retried-request as a new request. e.g. there is active-RM(RM1) and standby-RM(RM2) and RM's leader failovers from RM1 to RM2. 2. Still when EmbeddedElector choose same RM as a leader before and after the failover, RM goes into standby state and RM stop all services before failover and it reload the data of RMAppAttempt/RMApp. In this case, RM recognizes a retried-request as a new request. e.g. there is active-RM(RM1) and standby-RM(RM2) and RM's leader failovers from RM1 to RM1. I think it has no problem to mark these methods as Idempotent. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM > fail over > > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, > YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, > YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, > YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161496#comment-14161496 ] Jian He commented on YARN-1857: --- I found given that {{queueUsedResources >= userConsumed}}, we can simplify the formula to {code} min (userlimit - userConsumed, queueMaxCap- queueUsedResources) {code}, does this make sense ? > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, > YARN-1857.patch, YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2496: - Attachment: YARN-2496.patch Attached new patch against latest trunk > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2056) Disable preemption at Queue level
[ https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161487#comment-14161487 ] Wangda Tan commented on YARN-2056: -- Hi [~eepayne], Sorry to response so late, I've carefully read and thought about what you mentioned, especially the algorithm. I think it can resolve most issues, but it cannot guarante all case will be resolved. I think in your algorithm, following case will be not correct. {code} total = 100 qA: used = 10, guaranteed = 10, pending = 100 qB: used = 25, guaranteed = 10, pending = 100 (non-preemptable) qC: used = 0, guaranteed = 80, pending = 0 1. In the start, unassign = 100, It will first exclude qB, unassigned = 80 2. It will try to fill qA, qA = qA + 80 * (10/(10 + 80) = 18, unassigned = 72. qC will be removed this turn 3. qB will not be added back here, becaused ideal_assign(qA) = 18, ideal_assign(qB) = 25. 4. All resource will be used by qA. The result should be ideal(qA) = 75, ideal(qB) = 25 {code} And in addition, the remove-then-add-back algorithm seems not very straight-forward to me. In my mind, this problem is like fulfilling water to a water tank like following, some of the tank has stones, make some of them higher than others. Because of water flows, so the result is most equilibrized (which water surface has same height, and some stone can be higher than water surface). {code} _ | | __| X | |X |__ |X| | X X X| |X X X X| --- 1 2 3 4 5 {code} The algorithm may look like, {code} At the beginning, all queue will set ideal_assign = non-preemptable resource, and deduct non-preemptable resource from total-remained resource (stones is here). All queue will keep in qAlloc In each turn, - All queues not completely satisfied ideal_guaranteed <= min(maximum_capacity, used + pending) will NOT be removed. Like what we have today. (water hasn't reached ceiling of the tank) - Get normalized weight of each queue - Get the queue with the minimum {ideal_assigned % guarantee}, say Q_min - Get the target_height is = (Q_min.ideal_assigned + remained * Q_min.normalized_gurantee) - For each queue, do TempQueue.offer like today - The TempQueue.offer method looks like * If (q.ideal_assigned > target_height): skip * If (q.ideal_assigned <= target_height): accepted = min(q.maximum, q.used + q.pending, target_height * q.guaranteed) - q.ideal_assigned - If accepted becomes zero, remove the queue from qAlloc like today. The loop will exit until total-remained become zero (resource are exhausted) or qAlloc becomes empty (all queue get satisfied). {code} I think this algorithm can get a more balanced result. Does this make sense to you? Thanks, Wangda > Disable preemption at Queue level > - > > Key: YARN-2056 > URL: https://issues.apache.org/jira/browse/YARN-2056 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Mayank Bansal >Assignee: Eric Payne > Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, > YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, > YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, > YARN-2056.201409232329.txt, YARN-2056.201409242210.txt > > > We need to be able to disable preemption at individual queue level -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161465#comment-14161465 ] Steve Loughran commented on YARN-913: - Sanjay, I can do most of these, * w.r.t. the README, we have a javadoc {{package-info.java}}, that's enough. * I propose restricting the custom values that a service record to have to string attributes. support arbitrary JSON opens things up to people embedding entire custom JSON docs in there, which could kill the notion of having semi-standardised records that other apps can work with + published API endpoints for any extra stuff *outside the registry* * I'm going to rename the yarn fields back to {{yarn:id}} and {{yarn:persistence}} if Jersey+jackson marshalls them reliably once they aren't introspection-driven. It makes the yarn-nature of them clearer. > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, > YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, yarnregistry.pdf, > yarnregistry.pdf, yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-796: Attachment: (was: YARN-796.node-label.consolidate.13.patch) > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.10.patch, > YARN-796.node-label.consolidate.11.patch, > YARN-796.node-label.consolidate.12.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-796: Attachment: YARN-796.node-label.consolidate.13.patch > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.10.patch, > YARN-796.node-label.consolidate.11.patch, > YARN-796.node-label.consolidate.12.patch, > YARN-796.node-label.consolidate.13.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-796: Attachment: (was: YARN-796.node-label.consolidate.13.patch) > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.10.patch, > YARN-796.node-label.consolidate.11.patch, > YARN-796.node-label.consolidate.12.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-796: Attachment: YARN-796.node-label.consolidate.13.patch Updated to trunk > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.10.patch, > YARN-796.node-label.consolidate.11.patch, > YARN-796.node-label.consolidate.12.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161403#comment-14161403 ] Hadoop QA commented on YARN-796: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673284/YARN-796.node-label.consolidate.13.patch against trunk revision 519e5a7. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5298//console This message is automatically generated. > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.10.patch, > YARN-796.node-label.consolidate.11.patch, > YARN-796.node-label.consolidate.12.patch, > YARN-796.node-label.consolidate.13.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161402#comment-14161402 ] Hadoop QA commented on YARN-2496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673283/YARN-2496.patch against trunk revision 519e5a7. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5297//console This message is automatically generated. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-796: Attachment: YARN-796.node-label.consolidate.13.patch > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.10.patch, > YARN-796.node-label.consolidate.11.patch, > YARN-796.node-label.consolidate.12.patch, > YARN-796.node-label.consolidate.13.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2496: - Attachment: YARN-2496.patch > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161396#comment-14161396 ] Jian He commented on YARN-1879: --- bq. This is okay only if the RM handles these duplicate requests. In RM failover, even if the request from client’s perspective is duplicate; from RM’s perspective, these are just new requests, as the new RM doesn’t have any cache for previous requests from client. Just to unblock this, I suggest marking the annotation now so that the operation can be retried in failover. And discuss the internal implementation separately. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM > fail over > > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, > YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, > YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, > YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2647) [YARN-796] Add yarn queue CLI to get queue info including labels of such queue
[ https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G reassigned YARN-2647: - Assignee: Sunil G > [YARN-796] Add yarn queue CLI to get queue info including labels of such queue > -- > > Key: YARN-2647 > URL: https://issues.apache.org/jira/browse/YARN-2647 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Wangda Tan >Assignee: Sunil G > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2647) [YARN-796] Add yarn queue CLI to get queue info including labels of such queue
[ https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161395#comment-14161395 ] Sunil G commented on YARN-2647: --- Hi [~gp.leftnoteasy], I would like to take this up. Thank you. > [YARN-796] Add yarn queue CLI to get queue info including labels of such queue > -- > > Key: YARN-2647 > URL: https://issues.apache.org/jira/browse/YARN-2647 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Wangda Tan > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161373#comment-14161373 ] Hadoop QA commented on YARN-1857: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673238/YARN-1857.6.patch against trunk revision 519e5a7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5296//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5296//console This message is automatically generated. > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, > YARN-1857.patch, YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161368#comment-14161368 ] Karthik Kambatla commented on YARN-1879: Thanks for looking at it closely, Jian and Xuan. I have missed some of these points. Spent a little more time thinking through. bq. I think we are mixing two issues in this jira When we mark an API Idempotent or AtMostOnce, the retry-policies will end up re-invoking the API on the other RM in case of a failover. This is okay only if the RM handles these duplicate requests. Further, my understanding is that the behavior of "Idempotent" APIs should be the same on each invocation; i.e., the client should receive the exact same response too. If we handle duplicate requests but return a different response to the client on duplicate calls, we can mark it AtMostOnce. If we return the same response, we can go ahead and mark it Idempotent. Needless to say, the RM should definitely duplicate requests gracefully. Does that sound reasonable? > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM > fail over > > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, > YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, > YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, > YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster
[ https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161353#comment-14161353 ] Sanjay Radia commented on YARN-913: --- Some feedback: # rename {{RegistryOperations.create()}} to {{bind()}} # rename {{org/apache/hadoop/yarn/registry/client/services}} to {{org/apache/hadoop/yarn/registry/client/impl}} # move all ZK classes under {{org/apache/hadoop/yarn/registry/client/impl/zk}}, i.e. the current implementations of the registry client # {{RegistryOperations}} implementations to remove declaration of exceptions other than IOE. # {{RegistryOperations.resolve()} implementation should not mention record headers in exception text: that's an implementation detail # Add README under {{org.apache.hadoop.yarn.registry.server}} to emphasize this is server-side code # Allow {{ServiceRecord}} to support arbitrary key-values # remove {{yarn_id}} & {{yarn_persistence}} ffields rom {{ServiceRecord}}, moving them to the set of arbitrary key-values This ensures that there isn't explicit hard-coding of the assumption "these are YARN apps" from the records. > Add a way to register long-lived services in a YARN cluster > --- > > Key: YARN-913 > URL: https://issues.apache.org/jira/browse/YARN-913 > Project: Hadoop YARN > Issue Type: New Feature > Components: api, resourcemanager >Affects Versions: 2.5.0, 2.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran > Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, > 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, > YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, > YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, > YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, > YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, > YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, > YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, yarnregistry.pdf, > yarnregistry.pdf, yarnregistry.tla > > > In a YARN cluster you can't predict where services will come up -or on what > ports. The services need to work those things out as they come up and then > publish them somewhere. > Applications need to be able to find the service instance they are to bond to > -and not any others in the cluster. > Some kind of service registry -in the RM, in ZK, could do this. If the RM > held the write access to the ZK nodes, it would be more secure than having > apps register with ZK themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161299#comment-14161299 ] Hadoop QA commented on YARN-2641: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673239/YARN-2641.000.patch against trunk revision 519e5a7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5293//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5293//console This message is automatically generated. > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161300#comment-14161300 ] Hadoop QA commented on YARN-1857: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673238/YARN-1857.6.patch against trunk revision 519e5a7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5292//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5292//console This message is automatically generated. > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, > YARN-1857.patch, YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-1879: -- Summary: Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over (was: Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol) > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM > fail over > > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, > YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, > YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, > YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161282#comment-14161282 ] Xuan Gong commented on YARN-2583: - Those testcases fail because of binding exception. I do not think they are related. > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch, > YARN-2583.3.1.patch, YARN-2583.3.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161280#comment-14161280 ] Jian He commented on YARN-1879: --- I think we are mixing two issues in this jira: 1. Mark annotation on protocol for failover. (RM work-preserving failover won’t work without proper protocol annotations. RetryCache won’t help in this scenario, as the cache simply gets cleaned-up after failover/restart) 2. Change the API to return same response for duplicate requests. I propose let’s do 1) first which is what really affects work-preserving RM failover, and do 2) separately. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, > YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, > YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, > YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161279#comment-14161279 ] Hadoop QA commented on YARN-2583: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673242/YARN-2583.3.1.patch against trunk revision 519e5a7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.securTests {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5295//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5295//console This message is automatically generated. > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch, > YARN-2583.3.1.patch, YARN-2583.3.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161278#comment-14161278 ] Hadoop QA commented on YARN-2583: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673242/YARN-2583.3.1.patch against trunk revision 519e5a7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.securTests {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5294//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5294//console This message is automatically generated. > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch, > YARN-2583.3.1.patch, YARN-2583.3.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2649) Flaky test TestAMRMRPCNodeUpdates
[ https://issues.apache.org/jira/browse/YARN-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161262#comment-14161262 ] Hadoop QA commented on YARN-2649: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673231/YARN-2649.patch against trunk revision 519e5a7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5288//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5288//console This message is automatically generated. > Flaky test TestAMRMRPCNodeUpdates > - > > Key: YARN-2649 > URL: https://issues.apache.org/jira/browse/YARN-2649 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ming Ma > Attachments: YARN-2649.patch > > > Sometimes the test fails with the following error: > testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates) > Time elapsed: 41.73 sec <<< FAILURE! > junit.framework.AssertionFailedError: AppAttempt state is not correct > (timedout) expected: but was: > at junit.framework.Assert.fail(Assert.java:50) > at junit.framework.Assert.failNotEquals(Assert.java:287) > at junit.framework.Assert.assertEquals(Assert.java:67) > at > org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:382) > at > org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:125) > When this happens, SchedulerEventType.NODE_UPDATE was processed before > RMAppAttemptEvent.ATTEMPT_ADDED was processed. That is possible, given the > test only waits for RMAppState.ACCEPTED before having NM sending heartbeat. > This can be reproduced using custom AsyncDispatcher with CountDownLatch. Here > is the log when this happens. > {noformat} > App State is : ACCEPTED > 2014-10-05 21:25:07,305 INFO [AsyncDispatcher event handler] > attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - > appattempt_1412569506932_0001_01 State change from NEW to SUBMITTED > 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the > event > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType: > STATUS_UPDATE > 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] > rmnode.RMNodeImpl (RMNodeImpl.java:handle(384)) - Processing 127.0.0.1:1234 > of type STATUS_UPDATE > AppAttempt : appattempt_1412569506932_0001_01 State is : SUBMITTED > Waiting for state : ALLOCATED > 2014-10-05 21:25:07,306 DEBUG [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the > event > org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAttemptAddedSchedulerEvent.EventType: > APP_ATTEMPT_ADDED > 2014-10-05 21:25:07,328 DEBUG [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the > event > org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType: > NODE_UPDATE > 2014-10-05 21:25:07,330 DEBUG [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the > event > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent.EventType: > ATTEMPT_ADDED > 2014-10-05 21:25:07,331 DEBUG [AsyncDispatcher event handler] > attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(658)) - Processing > event for appattempt_1412569506932_0001_000 > 001 of type ATTEMPT_ADDED > 2014-10-05 21:25:07,333 INFO [AsyncDispatcher event handler] > attempt.RMAppAttemptIm
[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161246#comment-14161246 ] Hadoop QA commented on YARN-2629: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673235/YARN-2629.3.patch against trunk revision 519e5a7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5291//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5291//console This message is automatically generated. > Make distributed shell use the domain-based timeline ACLs > - > > Key: YARN-2629 > URL: https://issues.apache.org/jira/browse/YARN-2629 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch > > > For demonstration the usage of this feature (YARN-2102), it's good to make > the distributed shell create the domain, and post its timeline entities into > this private space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
[ https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161241#comment-14161241 ] Tsuyoshi OZAWA commented on YARN-1879: -- For now, I have no idea to reconstruct same response after failover. Currently latest patch only return empty response. This is one discussion point of this design. > Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol > --- > > Key: YARN-1879 > URL: https://issues.apache.org/jira/browse/YARN-1879 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Jian He >Assignee: Tsuyoshi OZAWA >Priority: Critical > Attachments: YARN-1879.1.patch, YARN-1879.1.patch, > YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, > YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, > YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, > YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, > YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, > YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, > YARN-1879.9.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2583: Attachment: YARN-2583.3.1.patch > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch, > YARN-2583.3.1.patch, YARN-2583.3.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161238#comment-14161238 ] Hadoop QA commented on YARN-2583: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673234/YARN-2583.3.patch against trunk revision 519e5a7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1268 javac compiler warnings (more than the trunk's current 1267 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5290//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5290//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5290//console This message is automatically generated. > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch, > YARN-2583.3.1.patch, YARN-2583.3.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161234#comment-14161234 ] Hadoop QA commented on YARN-2583: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673234/YARN-2583.3.patch against trunk revision 519e5a7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1268 javac compiler warnings (more than the trunk's current 1267 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5289//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5289//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5289//console This message is automatically generated. > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch, YARN-2583.3.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2641) improve node decommission latency in RM.
[ https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2641: Attachment: YARN-2641.000.patch > improve node decommission latency in RM. > > > Key: YARN-2641 > URL: https://issues.apache.org/jira/browse/YARN-2641 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2641.000.patch > > > improve node decommission latency in RM. > Currently the node decommission only happened after RM received nodeHeartbeat > from the Node Manager. The node heartbeat interval is configurable. The > default value is 1 second. > It will be better to do the decommission during RM Refresh(NodesListManager) > instead of nodeHeartbeat(ResourceTrackerService). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161221#comment-14161221 ] Hadoop QA commented on YARN-796: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673185/YARN-796.node-label.consolidate.12.patch against trunk revision 3affad9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 41 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapred.pipes.TestPipeApplication org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5282//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5282//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5282//console This message is automatically generated. > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.10.patch, > YARN-796.node-label.consolidate.11.patch, > YARN-796.node-label.consolidate.12.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161220#comment-14161220 ] Craig Welch commented on YARN-1857: --- [~john.jian.fang] - uploaded .6 on [YARN-2644], updated headroom calculation comment, fixed indentation > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, > YARN-1857.patch, YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1857: -- Attachment: YARN-1857.6.patch > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, > YARN-1857.patch, YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2649) Flaky test TestAMRMRPCNodeUpdates
[ https://issues.apache.org/jira/browse/YARN-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161210#comment-14161210 ] Jian He commented on YARN-2649: --- [~mingma], thanks for working on this ! bq. Another way to fix it is to change MockRM.submitApp to waitForState on RMAppAttempt. That might address other test cases that use MockRM.submitApp. I recently saw some other similar test failure e.g. YARN-2483. maybe this is what we should do. could you also run all tests locally, in case we don't introduce regression failure? thx > Flaky test TestAMRMRPCNodeUpdates > - > > Key: YARN-2649 > URL: https://issues.apache.org/jira/browse/YARN-2649 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ming Ma > Attachments: YARN-2649.patch > > > Sometimes the test fails with the following error: > testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates) > Time elapsed: 41.73 sec <<< FAILURE! > junit.framework.AssertionFailedError: AppAttempt state is not correct > (timedout) expected: but was: > at junit.framework.Assert.fail(Assert.java:50) > at junit.framework.Assert.failNotEquals(Assert.java:287) > at junit.framework.Assert.assertEquals(Assert.java:67) > at > org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:382) > at > org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:125) > When this happens, SchedulerEventType.NODE_UPDATE was processed before > RMAppAttemptEvent.ATTEMPT_ADDED was processed. That is possible, given the > test only waits for RMAppState.ACCEPTED before having NM sending heartbeat. > This can be reproduced using custom AsyncDispatcher with CountDownLatch. Here > is the log when this happens. > {noformat} > App State is : ACCEPTED > 2014-10-05 21:25:07,305 INFO [AsyncDispatcher event handler] > attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - > appattempt_1412569506932_0001_01 State change from NEW to SUBMITTED > 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the > event > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType: > STATUS_UPDATE > 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] > rmnode.RMNodeImpl (RMNodeImpl.java:handle(384)) - Processing 127.0.0.1:1234 > of type STATUS_UPDATE > AppAttempt : appattempt_1412569506932_0001_01 State is : SUBMITTED > Waiting for state : ALLOCATED > 2014-10-05 21:25:07,306 DEBUG [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the > event > org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAttemptAddedSchedulerEvent.EventType: > APP_ATTEMPT_ADDED > 2014-10-05 21:25:07,328 DEBUG [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the > event > org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType: > NODE_UPDATE > 2014-10-05 21:25:07,330 DEBUG [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the > event > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent.EventType: > ATTEMPT_ADDED > 2014-10-05 21:25:07,331 DEBUG [AsyncDispatcher event handler] > attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(658)) - Processing > event for appattempt_1412569506932_0001_000 > 001 of type ATTEMPT_ADDED > 2014-10-05 21:25:07,333 INFO [AsyncDispatcher event handler] > attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - > appattempt_1412569506932_0001_01 State change from SUBMITTED to SCHEDULED > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2629) Make distributed shell use the domain-based timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2629: -- Attachment: YARN-2629.3.patch Upload a new patch: 1. Fix the test failure 2. Remove two lines of unnecessary code in TimelineClientImpl 3. Improve the code of publishing entities in DS AM > Make distributed shell use the domain-based timeline ACLs > - > > Key: YARN-2629 > URL: https://issues.apache.org/jira/browse/YARN-2629 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch > > > For demonstration the usage of this feature (YARN-2102), it's good to make > the distributed shell create the domain, and post its timeline entities into > this private space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2583: Attachment: YARN-2583.3.patch > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch, YARN-2583.3.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2649) Flaky test TestAMRMRPCNodeUpdates
[ https://issues.apache.org/jira/browse/YARN-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated YARN-2649: -- Attachment: YARN-2649.patch Fix the test code to wait until RMAppAttemptImpl gets to RMAppAttemptState.SCHEDULED state before having the nm heartbeat. Another way to fix it is to change MockRM.submitApp to waitForState on RMAppAttempt. That might address other test cases that use MockRM.submitApp. > Flaky test TestAMRMRPCNodeUpdates > - > > Key: YARN-2649 > URL: https://issues.apache.org/jira/browse/YARN-2649 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Ming Ma > Attachments: YARN-2649.patch > > > Sometimes the test fails with the following error: > testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates) > Time elapsed: 41.73 sec <<< FAILURE! > junit.framework.AssertionFailedError: AppAttempt state is not correct > (timedout) expected: but was: > at junit.framework.Assert.fail(Assert.java:50) > at junit.framework.Assert.failNotEquals(Assert.java:287) > at junit.framework.Assert.assertEquals(Assert.java:67) > at > org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) > at > org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:382) > at > org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:125) > When this happens, SchedulerEventType.NODE_UPDATE was processed before > RMAppAttemptEvent.ATTEMPT_ADDED was processed. That is possible, given the > test only waits for RMAppState.ACCEPTED before having NM sending heartbeat. > This can be reproduced using custom AsyncDispatcher with CountDownLatch. Here > is the log when this happens. > {noformat} > App State is : ACCEPTED > 2014-10-05 21:25:07,305 INFO [AsyncDispatcher event handler] > attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - > appattempt_1412569506932_0001_01 State change from NEW to SUBMITTED > 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the > event > org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType: > STATUS_UPDATE > 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] > rmnode.RMNodeImpl (RMNodeImpl.java:handle(384)) - Processing 127.0.0.1:1234 > of type STATUS_UPDATE > AppAttempt : appattempt_1412569506932_0001_01 State is : SUBMITTED > Waiting for state : ALLOCATED > 2014-10-05 21:25:07,306 DEBUG [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the > event > org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAttemptAddedSchedulerEvent.EventType: > APP_ATTEMPT_ADDED > 2014-10-05 21:25:07,328 DEBUG [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the > event > org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType: > NODE_UPDATE > 2014-10-05 21:25:07,330 DEBUG [AsyncDispatcher event handler] > event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the > event > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent.EventType: > ATTEMPT_ADDED > 2014-10-05 21:25:07,331 DEBUG [AsyncDispatcher event handler] > attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(658)) - Processing > event for appattempt_1412569506932_0001_000 > 001 of type ATTEMPT_ADDED > 2014-10-05 21:25:07,333 INFO [AsyncDispatcher event handler] > attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - > appattempt_1412569506932_0001_01 State change from SUBMITTED to SCHEDULED > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate
[ https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161174#comment-14161174 ] Hudson commented on YARN-2644: -- SUCCESS: Integrated in Hadoop-trunk-Commit #6202 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6202/]) YARN-2644. Fixed CapacityScheduler to return up-to-date headroom when AM allocates. Contributed by Craig Welch (jianhe: rev 519e5a7dd2bd540105434ec3c8939b68f6c024f8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java > Recalculate headroom more frequently to keep it accurate > > > Key: YARN-2644 > URL: https://issues.apache.org/jira/browse/YARN-2644 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Craig Welch >Assignee: Craig Welch > Fix For: 2.6.0 > > Attachments: YARN-2644.11.patch, YARN-2644.14.patch, > YARN-2644.15.patch, YARN-2644.15.patch > > > See parent (1198) for more detail - this specifically covers calculating the > headroom more frequently, to cover the cases where changes have occurred > which impact headroom but which are not reflected due to an application not > being updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2649) Flaky test TestAMRMRPCNodeUpdates
Ming Ma created YARN-2649: - Summary: Flaky test TestAMRMRPCNodeUpdates Key: YARN-2649 URL: https://issues.apache.org/jira/browse/YARN-2649 Project: Hadoop YARN Issue Type: Bug Reporter: Ming Ma Sometimes the test fails with the following error: testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates) Time elapsed: 41.73 sec <<< FAILURE! junit.framework.AssertionFailedError: AppAttempt state is not correct (timedout) expected: but was: at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:382) at org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:125) When this happens, SchedulerEventType.NODE_UPDATE was processed before RMAppAttemptEvent.ATTEMPT_ADDED was processed. That is possible, given the test only waits for RMAppState.ACCEPTED before having NM sending heartbeat. This can be reproduced using custom AsyncDispatcher with CountDownLatch. Here is the log when this happens. {noformat} App State is : ACCEPTED 2014-10-05 21:25:07,305 INFO [AsyncDispatcher event handler] attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - appattempt_1412569506932_0001_01 State change from NEW to SUBMITTED 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType: STATUS_UPDATE 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] rmnode.RMNodeImpl (RMNodeImpl.java:handle(384)) - Processing 127.0.0.1:1234 of type STATUS_UPDATE AppAttempt : appattempt_1412569506932_0001_01 State is : SUBMITTED Waiting for state : ALLOCATED 2014-10-05 21:25:07,306 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAttemptAddedSchedulerEvent.EventType: APP_ATTEMPT_ADDED 2014-10-05 21:25:07,328 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType: NODE_UPDATE 2014-10-05 21:25:07,330 DEBUG [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent.EventType: ATTEMPT_ADDED 2014-10-05 21:25:07,331 DEBUG [AsyncDispatcher event handler] attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(658)) - Processing event for appattempt_1412569506932_0001_000 001 of type ATTEMPT_ADDED 2014-10-05 21:25:07,333 INFO [AsyncDispatcher event handler] attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - appattempt_1412569506932_0001_01 State change from SUBMITTED to SCHEDULED {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161164#comment-14161164 ] Jian He commented on YARN-1857: --- could you please update the patch on top of YARN-2644 ? comments in the meanwhile: - update the code comments about the new calculation of headroom {code} /** * Headroom is min((userLimit, queue-max-cap) - consumed) */ {code} - indentation of this line {{Resources.subtract(queueMaxCap, usedResources));}} > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.patch, YARN-1857.patch, > YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.
[ https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161160#comment-14161160 ] Wilfred Spiegelenburg commented on YARN-1061: - This is a dupe from YARN-2578. Writes do not time out and they should. > NodeManager is indefinitely waiting for nodeHeartBeat() response from > ResouceManager. > - > > Key: YARN-1061 > URL: https://issues.apache.org/jira/browse/YARN-1061 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.5-alpha >Reporter: Rohith > > It is observed that in one of the scenario, NodeManger is indefinetly waiting > for nodeHeartbeat response from ResouceManger where ResouceManger is in > hanged up state. > NodeManager should get timeout exception instead of waiting indefinetly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161155#comment-14161155 ] Hadoop QA commented on YARN-2496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673224/YARN-2496.patch against trunk revision 8dc6abf. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5287//console This message is automatically generated. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch, YARN-2496.patch, YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate
[ https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161152#comment-14161152 ] Jian He commented on YARN-2644: --- looks good, committing > Recalculate headroom more frequently to keep it accurate > > > Key: YARN-2644 > URL: https://issues.apache.org/jira/browse/YARN-2644 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-2644.11.patch, YARN-2644.14.patch, > YARN-2644.15.patch, YARN-2644.15.patch > > > See parent (1198) for more detail - this specifically covers calculating the > headroom more frequently, to cover the cases where changes have occurred > which impact headroom but which are not reflected due to an application not > being updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate
[ https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161148#comment-14161148 ] Hadoop QA commented on YARN-2644: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673200/YARN-2644.15.patch against trunk revision 3affad9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5285//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5285//console This message is automatically generated. > Recalculate headroom more frequently to keep it accurate > > > Key: YARN-2644 > URL: https://issues.apache.org/jira/browse/YARN-2644 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-2644.11.patch, YARN-2644.14.patch, > YARN-2644.15.patch, YARN-2644.15.patch > > > See parent (1198) for more detail - this specifically covers calculating the > headroom more frequently, to cover the cases where changes have occurred > which impact headroom but which are not reflected due to an application not > being updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2377) Localization exception stack traces are not passed as diagnostic info
[ https://issues.apache.org/jira/browse/YARN-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161127#comment-14161127 ] Jason Lowe commented on YARN-2377: -- Thanks for the patch, Gera. I think the toString should be on SerializedException rather than SerializedExceptionPBImpl, since there's nothing implementation-specific about the way it tries to convert to a string -- it always goes through the interfaces to get the necessary things. If a specific implementation really needs a different toString method then they can always override. Nit: sringify should be stringify. Also curious why it isn't static or otherwise assumes e == this and not take the additional parameter, since we can delegate to cause.stringify when processing the cause portion of the traceback. > Localization exception stack traces are not passed as diagnostic info > - > > Key: YARN-2377 > URL: https://issues.apache.org/jira/browse/YARN-2377 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Gera Shegalov >Assignee: Gera Shegalov > Attachments: YARN-2377.v01.patch > > > In the Localizer log one can only see this kind of message > {code} > 14/07/31 10:29:00 INFO localizer.ResourceLocalizationService: DEBUG: FAILED { > hdfs://ha-nn-uri-0:8020/tmp/hadoop-yarn/staging/gshegalov/.staging/job_1406825443306_0004/job.jar, > 1406827248944, PATTERN, (?:classes/|lib/).* }, java.net.UnknownHos > tException: ha-nn-uri-0 > {code} > And then only {{ java.net.UnknownHostException: ha-nn-uri-0}} message is > propagated as diagnostics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2496: - Attachment: YARN-2496.patch > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch, YARN-2496.patch, YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2633) TestContainerLauncherImpl sometimes fails
[ https://issues.apache.org/jira/browse/YARN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161038#comment-14161038 ] Hadoop QA commented on YARN-2633: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673174/YARN-2633.patch against trunk revision 687d83c. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5286//console This message is automatically generated. > TestContainerLauncherImpl sometimes fails > - > > Key: YARN-2633 > URL: https://issues.apache.org/jira/browse/YARN-2633 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-2633.patch > > > {noformat} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.api.ContainerManagementProtocol$$EnhancerByMockitoWithCGLIB$$25708415.close() > at java.lang.Class.getMethod(Class.java:1665) > at > org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.stopClient(RpcClientFactoryPBImpl.java:90) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.stopProxy(HadoopYarnProtoRPC.java:54) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.mayBeCloseProxy(ContainerManagementProtocolProxy.java:79) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:225) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.shutdownAllContainers(ContainerLauncherImpl.java:320) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.serviceStop(ContainerLauncherImpl.java:331) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncherImpl.testMyShutdown(TestContainerLauncherImpl.java:315) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2644) Recalculate headroom more frequently to keep it accurate
[ https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2644: -- Attachment: YARN-2644.15.patch Reupload to see if jenkins can apply the patch now > Recalculate headroom more frequently to keep it accurate > > > Key: YARN-2644 > URL: https://issues.apache.org/jira/browse/YARN-2644 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-2644.11.patch, YARN-2644.14.patch, > YARN-2644.15.patch, YARN-2644.15.patch > > > See parent (1198) for more detail - this specifically covers calculating the > headroom more frequently, to cover the cases where changes have occurred > which impact headroom but which are not reflected due to an application not > being updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160989#comment-14160989 ] Hadoop QA commented on YARN-2583: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673189/YARN-2583.2.patch against trunk revision 3affad9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1268 javac compiler warnings (more than the trunk's current 1267 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.logaggregation.TestAggregatedLogDeletionService org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5283//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5283//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5283//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5283//console This message is automatically generated. > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160963#comment-14160963 ] Hadoop QA commented on YARN-2629: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673184/YARN-2629.2.patch against trunk revision 3affad9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5281//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5281//console This message is automatically generated. > Make distributed shell use the domain-based timeline ACLs > - > > Key: YARN-2629 > URL: https://issues.apache.org/jira/browse/YARN-2629 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2629.1.patch, YARN-2629.2.patch > > > For demonstration the usage of this feature (YARN-2102), it's good to make > the distributed shell create the domain, and post its timeline entities into > this private space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS
[ https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2583: Attachment: YARN-2583.2.patch > Modify the LogDeletionService to support Log aggregation for LRS > > > Key: YARN-2583 > URL: https://issues.apache.org/jira/browse/YARN-2583 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-2583.1.patch, YARN-2583.2.patch > > > Currently, AggregatedLogDeletionService will delete old logs from HDFS. It > will check the cut-off-time, if all logs for this application is older than > this cut-off-time. The app-log-dir from HDFS will be deleted. This will not > work for LRS. We expect a LRS application can keep running for a long time. > Two different scenarios: > 1) If we configured the rollingIntervalSeconds, the new log file will be > always uploaded to HDFS. The number of log files for this application will > become larger and larger. And there is no log files will be deleted. > 2) If we did not configure the rollingIntervalSeconds, the log file can only > be uploaded to HDFS after the application is finished. It is very possible > that the logs are uploaded after the cut-off-time. It will cause problem > because at that time the app-log-dir for this application in HDFS has been > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-796: Attachment: YARN-796.node-label.consolidate.12.patch > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 2.4.1 >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.10.patch, > YARN-796.node-label.consolidate.11.patch, > YARN-796.node-label.consolidate.12.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2629) Make distributed shell use the domain-based timeline ACLs
[ https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2629: -- Attachment: YARN-2629.2.patch Add one more option "-create", which make client only try to create a new domain once this flag is set. In addition, fix an existing problem in DS AM. AM should use the submitter UGI to put the entities. Create a patch of these changes. > Make distributed shell use the domain-based timeline ACLs > - > > Key: YARN-2629 > URL: https://issues.apache.org/jira/browse/YARN-2629 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-2629.1.patch, YARN-2629.2.patch > > > For demonstration the usage of this feature (YARN-2102), it's good to make > the distributed shell create the domain, and post its timeline entities into > this private space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels
[ https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160858#comment-14160858 ] Wangda Tan commented on YARN-2500: -- Regarding comments from [~vinodkv], bq. As with other patches, Labels -> NodeLabels. You'll need to change all of the following:... Addressed bq. ApplicationMasterService: There are multiple this.rmContext.getRMApps().get(appAttemptId.getApplicationId() calls in the the allocate method. Refactor to avoid dup calls. Addressed bq. TestSchedulerUtils: testValidateResourceRequestWithErrorLabelsPermission: Why are "" and " " accepted when only x and y are recognized labels? Empty label expression "" should be accept by any queue, and " " will be trimmed to empty. bq. Given we don't support yet other features in ResourceRequest for the AM container like priority, locality, shall we also hard-code them to AM_CONTAINER_PRIORITY, ResourceRequest.ANY respectively too? Agree, now set values to default for priority/#container/resource-name/relax-locality bq. Can we add test-case for num-containers, priority, locality for AM container? Added test "testScheduleTransitionReplaceAMContainerRequestWithDefaults" in RMAppAttemptImpl. Please kindly review, Thanks, Wangda > [YARN-796] Miscellaneous changes in ResourceManager to support labels > - > > Key: YARN-2500 > URL: https://issues.apache.org/jira/browse/YARN-2500 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2500.patch, YARN-2500.patch, YARN-2500.patch, > YARN-2500.patch, YARN-2500.patch > > > This patches contains changes in ResourceManager to support labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels
[ https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2500: - Attachment: YARN-2500.patch > [YARN-796] Miscellaneous changes in ResourceManager to support labels > - > > Key: YARN-2500 > URL: https://issues.apache.org/jira/browse/YARN-2500 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2500.patch, YARN-2500.patch, YARN-2500.patch, > YARN-2500.patch, YARN-2500.patch > > > This patches contains changes in ResourceManager to support labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160819#comment-14160819 ] Hadoop QA commented on YARN-1857: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673170/YARN-1857.5.patch against trunk revision ea26cc0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5279//console This message is automatically generated. > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.patch, YARN-1857.patch, > YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2633) TestContainerLauncherImpl sometimes fails
[ https://issues.apache.org/jira/browse/YARN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2633: Attachment: YARN-2633.patch Attaching the patch. The caught exception was to be ignored. Still throwing YarnRuntimeException from the catch clause did not make sense. deleting the line that throws the exception. > TestContainerLauncherImpl sometimes fails > - > > Key: YARN-2633 > URL: https://issues.apache.org/jira/browse/YARN-2633 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-2633.patch > > > {noformat} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.api.ContainerManagementProtocol$$EnhancerByMockitoWithCGLIB$$25708415.close() > at java.lang.Class.getMethod(Class.java:1665) > at > org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.stopClient(RpcClientFactoryPBImpl.java:90) > at > org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.stopProxy(HadoopYarnProtoRPC.java:54) > at > org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.mayBeCloseProxy(ContainerManagementProtocolProxy.java:79) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:225) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.shutdownAllContainers(ContainerLauncherImpl.java:320) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.serviceStop(ContainerLauncherImpl.java:331) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncherImpl.testMyShutdown(TestContainerLauncherImpl.java:315) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running
[ https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-1857: -- Attachment: YARN-1857.5.patch Updating to current trunk on new(er) repo > CapacityScheduler headroom doesn't account for other AM's running > - > > Key: YARN-1857 > URL: https://issues.apache.org/jira/browse/YARN-1857 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Affects Versions: 2.3.0 >Reporter: Thomas Graves >Assignee: Chen He >Priority: Critical > Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, > YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.patch, YARN-1857.patch, > YARN-1857.patch > > > Its possible to get an application to hang forever (or a long time) in a > cluster with multiple users. The reason why is that the headroom sent to the > application is based on the user limit but it doesn't account for other > Application masters using space in that queue. So the headroom (user limit - > user consumed) can be > 0 even though the cluster is 100% full because the > other space is being used by application masters from other users. > For instance if you have a cluster with 1 queue, user limit is 100%, you have > multiple users submitting applications. One very large application by user 1 > starts up, runs most of its maps and starts running reducers. other users try > to start applications and get their application masters started but not > tasks. The very large application then gets to the point where it has > consumed the rest of the cluster resources with all reduces. But at this > point it needs to still finish a few maps. The headroom being sent to this > application is only based on the user limit (which is 100% of the cluster > capacity) its using lets say 95% of the cluster for reduces and then other 5% > is being used by other users running application masters. The MRAppMaster > thinks it still has 5% so it doesn't know that it should kill a reduce in > order to run a map. > This can happen in other scenarios also. Generally in a large cluster with > multiple queues this shouldn't cause a hang forever but it could cause the > application to take much longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2644) Recalculate headroom more frequently to keep it accurate
[ https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2644: -- Attachment: YARN-2644.15.patch Update patch against latest trunk in new(er) git repo > Recalculate headroom more frequently to keep it accurate > > > Key: YARN-2644 > URL: https://issues.apache.org/jira/browse/YARN-2644 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-2644.11.patch, YARN-2644.14.patch, > YARN-2644.15.patch > > > See parent (1198) for more detail - this specifically covers calculating the > headroom more frequently, to cover the cases where changes have occurred > which impact headroom but which are not reflected due to an application not > being updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2648) need mechanism for updating HDFS delegation tokens associated with container launch contexts
Jonathan Maron created YARN-2648: Summary: need mechanism for updating HDFS delegation tokens associated with container launch contexts Key: YARN-2648 URL: https://issues.apache.org/jira/browse/YARN-2648 Project: Hadoop YARN Issue Type: Bug Components: nodemanager, resourcemanager Reporter: Jonathan Maron During the launch of a container, the required delegation tokens (e.g. HDFS) are passed to the launch context. If those tokens expire and the container requires a restart the restart attempt will fail. Sample log output: 2014-10-06 18:37:28,609 WARN ipc.Client (Client.java:run(675)) - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 124 for hbase) can't be found in cache -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.
[ https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160687#comment-14160687 ] Hadoop QA commented on YARN-2566: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673150/YARN-2566.003.patch against trunk revision ea26cc0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5276//console This message is automatically generated. > IOException happen in startLocalizer of DefaultContainerExecutor due to not > enough disk space for the first localDir. > - > > Key: YARN-2566 > URL: https://issues.apache.org/jira/browse/YARN-2566 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2566.000.patch, YARN-2566.001.patch, > YARN-2566.002.patch, YARN-2566.003.patch > > > startLocalizer in DefaultContainerExecutor will only use the first localDir > to copy the token file, if the copy is failed for first localDir due to not > enough disk space in the first localDir, the localization will be failed even > there are plenty of disk space in other localDirs. We see the following error > for this case: > {code} > 2014-09-13 23:33:25,171 WARN > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to > create app directory > /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 > java.io.IOException: mkdir of > /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) > 2014-09-13 23:33:25,185 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer failed > java.io.FileNotFoundException: File > file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111) > at > org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76) > at > org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:344) > at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390) > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.create(FileContext.java:673) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizati
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160684#comment-14160684 ] Hadoop QA commented on YARN-2496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673145/YARN-2496.patch against trunk revision ea26cc0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5277//console This message is automatically generated. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch, YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2544) [YARN-796] Common server side PB changes (not include user API PB changes)
[ https://issues.apache.org/jira/browse/YARN-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160673#comment-14160673 ] Hadoop QA commented on YARN-2544: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673138/YARN-2544.patch against trunk revision ea26cc0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5275//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5275//console This message is automatically generated. > [YARN-796] Common server side PB changes (not include user API PB changes) > -- > > Key: YARN-2544 > URL: https://issues.apache.org/jira/browse/YARN-2544 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2544.patch, YARN-2544.patch, YARN-2544.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2576) Prepare yarn-1051 branch for merging with trunk
[ https://issues.apache.org/jira/browse/YARN-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-2576: - Fix Version/s: 2.6.0 > Prepare yarn-1051 branch for merging with trunk > --- > > Key: YARN-2576 > URL: https://issues.apache.org/jira/browse/YARN-2576 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager, scheduler >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Fix For: 2.6.0 > > Attachments: YARN-2576.patch, YARN-2576.patch > > > This JIRA is to track the changes required to ensure branch yarn-1051 is > ready to be merged with trunk. This includes fixing any compilation issues, > findbug and/or javadoc warning, test cases failures, etc if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2576) Prepare yarn-1051 branch for merging with trunk
[ https://issues.apache.org/jira/browse/YARN-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-2576: - Target Version/s: (was: 3.0.0) > Prepare yarn-1051 branch for merging with trunk > --- > > Key: YARN-2576 > URL: https://issues.apache.org/jira/browse/YARN-2576 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager, scheduler >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Fix For: 2.6.0 > > Attachments: YARN-2576.patch, YARN-2576.patch > > > This JIRA is to track the changes required to ensure branch yarn-1051 is > ready to be merged with trunk. This includes fixing any compilation issues, > findbug and/or javadoc warning, test cases failures, etc if any. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2475) ReservationSystem: replan upon capacity reduction
[ https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-2475: - Target Version/s: (was: 3.0.0) > ReservationSystem: replan upon capacity reduction > - > > Key: YARN-2475 > URL: https://issues.apache.org/jira/browse/YARN-2475 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.6.0 > > Attachments: YARN-2475.patch, YARN-2475.patch, YARN-2475.patch > > > In the context of YARN-1051, if capacity of the cluster drops significantly > upon machine failures we need to trigger a reorganization of the planned > reservations. As reservations are "absolute" it is possible that they will > not all fit, and some need to be rejected a-posteriori. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2611) Fix jenkins findbugs warning and test case failures for trunk merge patch
[ https://issues.apache.org/jira/browse/YARN-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-2611: - Target Version/s: (was: 3.0.0) > Fix jenkins findbugs warning and test case failures for trunk merge patch > - > > Key: YARN-2611 > URL: https://issues.apache.org/jira/browse/YARN-2611 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager, scheduler >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Fix For: 2.6.0 > > Attachments: YARN-2611.patch > > > This JIRA is to fix jenkins findbugs warnings and test case failures for > trunk merge patch as [reported | > https://issues.apache.org/jira/browse/YARN-1051?focusedCommentId=14148506] in > YARN-1051 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2475) ReservationSystem: replan upon capacity reduction
[ https://issues.apache.org/jira/browse/YARN-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-2475: - Fix Version/s: 2.6.0 > ReservationSystem: replan upon capacity reduction > - > > Key: YARN-2475 > URL: https://issues.apache.org/jira/browse/YARN-2475 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.6.0 > > Attachments: YARN-2475.patch, YARN-2475.patch, YARN-2475.patch > > > In the context of YARN-1051, if capacity of the cluster drops significantly > upon machine failures we need to trigger a reorganization of the planned > reservations. As reservations are "absolute" it is possible that they will > not all fit, and some need to be rejected a-posteriori. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2611) Fix jenkins findbugs warning and test case failures for trunk merge patch
[ https://issues.apache.org/jira/browse/YARN-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-2611: - Fix Version/s: 2.6.0 > Fix jenkins findbugs warning and test case failures for trunk merge patch > - > > Key: YARN-2611 > URL: https://issues.apache.org/jira/browse/YARN-2611 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager, scheduler >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Fix For: 2.6.0 > > Attachments: YARN-2611.patch > > > This JIRA is to fix jenkins findbugs warnings and test case failures for > trunk merge patch as [reported | > https://issues.apache.org/jira/browse/YARN-1051?focusedCommentId=14148506] in > YARN-1051 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue
[ https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-2385: - Fix Version/s: (was: 2.6.0) > Consider splitting getAppsinQueue to getRunningAppsInQueue + > getPendingAppsInQueue > -- > > Key: YARN-2385 > URL: https://issues.apache.org/jira/browse/YARN-2385 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Labels: abstractyarnscheduler > > Currently getAppsinQueue returns both pending & running apps. The purpose of > the JIRA is to explore splitting it to getRunningAppsInQueue + > getPendingAppsInQueue that will provide more flexibility to callers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2389) Adding support for drainig a queue, ie killing all apps in the queue
[ https://issues.apache.org/jira/browse/YARN-2389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-2389: - Target Version/s: (was: 2.6.0) > Adding support for drainig a queue, ie killing all apps in the queue > > > Key: YARN-2389 > URL: https://issues.apache.org/jira/browse/YARN-2389 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Labels: capacity-scheduler, fairscheduler > Fix For: 2.6.0 > > Attachments: YARN-2389-1.patch, YARN-2389.patch > > > This is a parallel JIRA to YARN-2378. Fair scheduler already supports moving > a single application from one queue to another. This will add support to move > all applications from the specified source queue to target. This will use > YARN-2385 so will work for both Capacity & Fair scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue
[ https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-2385: - Fix Version/s: 2.6.0 > Consider splitting getAppsinQueue to getRunningAppsInQueue + > getPendingAppsInQueue > -- > > Key: YARN-2385 > URL: https://issues.apache.org/jira/browse/YARN-2385 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Labels: abstractyarnscheduler > Fix For: 2.6.0 > > > Currently getAppsinQueue returns both pending & running apps. The purpose of > the JIRA is to explore splitting it to getRunningAppsInQueue + > getPendingAppsInQueue that will provide more flexibility to callers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-2378: - Target Version/s: (was: 2.6.0) > Adding support for moving apps between queues in Capacity Scheduler > --- > > Key: YARN-2378 > URL: https://issues.apache.org/jira/browse/YARN-2378 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Labels: capacity-scheduler > Fix For: 2.6.0 > > Attachments: YARN-2378-1.patch, YARN-2378.patch, YARN-2378.patch, > YARN-2378.patch, YARN-2378.patch > > > As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 > to smaller patches for manageability. This JIRA will address adding support > for moving apps between queues in Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-2080: - Target Version/s: (was: 3.0.0) > Admission Control: Integrate Reservation subsystem with ResourceManager > --- > > Key: YARN-2080 > URL: https://issues.apache.org/jira/browse/YARN-2080 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Fix For: 2.6.0 > > Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, > YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch > > > This JIRA tracks the integration of Reservation subsystem data structures > introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring > of YARN-1051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1712) Admission Control: plan follower
[ https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-1712: - Target Version/s: (was: 3.0.0) > Admission Control: plan follower > > > Key: YARN-1712 > URL: https://issues.apache.org/jira/browse/YARN-1712 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Labels: reservations, scheduler > Fix For: 2.6.0 > > Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch, > YARN-1712.4.patch, YARN-1712.5.patch, YARN-1712.patch > > > This JIRA tracks a thread that continuously propagates the current state of > an reservation subsystem to the scheduler. As the inventory subsystem store > the "plan" of how the resources should be subdivided, the work we propose in > this JIRA realizes such plan by dynamically instructing the CapacityScheduler > to add/remove/resize queues to follow the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager
[ https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-2080: - Fix Version/s: 2.6.0 > Admission Control: Integrate Reservation subsystem with ResourceManager > --- > > Key: YARN-2080 > URL: https://issues.apache.org/jira/browse/YARN-2080 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Subru Krishnan >Assignee: Subru Krishnan > Fix For: 2.6.0 > > Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, > YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch > > > This JIRA tracks the integration of Reservation subsystem data structures > introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring > of YARN-1051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1712) Admission Control: plan follower
[ https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-1712: - Fix Version/s: 2.6.0 > Admission Control: plan follower > > > Key: YARN-1712 > URL: https://issues.apache.org/jira/browse/YARN-1712 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Labels: reservations, scheduler > Fix For: 2.6.0 > > Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch, > YARN-1712.4.patch, YARN-1712.5.patch, YARN-1712.patch > > > This JIRA tracks a thread that continuously propagates the current state of > an reservation subsystem to the scheduler. As the inventory subsystem store > the "plan" of how the resources should be subdivided, the work we propose in > this JIRA realizes such plan by dynamically instructing the CapacityScheduler > to add/remove/resize queues to follow the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
[ https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-1711: - Target Version/s: (was: 3.0.0) > CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709 > -- > > Key: YARN-1711 > URL: https://issues.apache.org/jira/browse/YARN-1711 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino > Labels: reservations > Fix For: 2.6.0 > > Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, > YARN-1711.4.patch, YARN-1711.5.patch, YARN-1711.patch > > > This JIRA tracks the development of a policy that enforces user quotas (a > time-extension of the notion of capacity) in the inventory subsystem > discussed in YARN-1709. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
[ https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-1711: - Fix Version/s: 2.6.0 > CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709 > -- > > Key: YARN-1711 > URL: https://issues.apache.org/jira/browse/YARN-1711 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino > Labels: reservations > Fix For: 2.6.0 > > Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, > YARN-1711.4.patch, YARN-1711.5.patch, YARN-1711.patch > > > This JIRA tracks the development of a policy that enforces user quotas (a > time-extension of the notion of capacity) in the inventory subsystem > discussed in YARN-1709. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1710) Admission Control: agents to allocate reservation
[ https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-1710: - Fix Version/s: 2.6.0 > Admission Control: agents to allocate reservation > - > > Key: YARN-1710 > URL: https://issues.apache.org/jira/browse/YARN-1710 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.6.0 > > Attachments: YARN-1710.1.patch, YARN-1710.2.patch, YARN-1710.3.patch, > YARN-1710.4.patch, YARN-1710.patch > > > This JIRA tracks the algorithms used to allocate a user ReservationRequest > coming in from the new reservation API (YARN-1708), in the inventory > subsystem (YARN-1709) maintaining the current plan for the cluster. The focus > of this "agents" is to quickly find a solution for the set of contraints > provided by the user, and the physical constraints of the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1710) Admission Control: agents to allocate reservation
[ https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-1710: - Target Version/s: (was: 3.0.0) > Admission Control: agents to allocate reservation > - > > Key: YARN-1710 > URL: https://issues.apache.org/jira/browse/YARN-1710 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Carlo Curino > Fix For: 2.6.0 > > Attachments: YARN-1710.1.patch, YARN-1710.2.patch, YARN-1710.3.patch, > YARN-1710.4.patch, YARN-1710.patch > > > This JIRA tracks the algorithms used to allocate a user ReservationRequest > coming in from the new reservation API (YARN-1708), in the inventory > subsystem (YARN-1709) maintaining the current plan for the cluster. The focus > of this "agents" is to quickly find a solution for the set of contraints > provided by the user, and the physical constraints of the plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-1709: - Target Version/s: (was: 3.0.0) > Admission Control: Reservation subsystem > > > Key: YARN-1709 > URL: https://issues.apache.org/jira/browse/YARN-1709 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Subru Krishnan > Fix For: 2.6.0 > > Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, > YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch > > > This JIRA is about the key data structure used to track resources over time > to enable YARN-1051. The Reservation subsystem is conceptually a "plan" of > how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem
[ https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-1709: - Fix Version/s: 2.6.0 > Admission Control: Reservation subsystem > > > Key: YARN-1709 > URL: https://issues.apache.org/jira/browse/YARN-1709 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Subru Krishnan > Fix For: 2.6.0 > > Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, > YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch > > > This JIRA is about the key data structure used to track resources over time > to enable YARN-1051. The Reservation subsystem is conceptually a "plan" of > how the scheduler will allocate resources over-time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)
[ https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-1708: - Target Version/s: (was: 3.0.0) > Add a public API to reserve resources (part of YARN-1051) > - > > Key: YARN-1708 > URL: https://issues.apache.org/jira/browse/YARN-1708 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Subru Krishnan > Fix For: 2.6.0 > > Attachments: YARN-1708.patch, YARN-1708.patch, YARN-1708.patch, > YARN-1708.patch > > > This JIRA tracks the definition of a new public API for YARN, which allows > users to reserve resources (think of time-bounded queues). This is part of > the admission control enhancement proposed in YARN-1051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.
[ https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160654#comment-14160654 ] zhihai xu commented on YARN-2566: - I don't see the problem "-1 javac. The patch appears to cause the build to fail." in my local build. Restart the Jenkins test. > IOException happen in startLocalizer of DefaultContainerExecutor due to not > enough disk space for the first localDir. > - > > Key: YARN-2566 > URL: https://issues.apache.org/jira/browse/YARN-2566 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2566.000.patch, YARN-2566.001.patch, > YARN-2566.002.patch, YARN-2566.003.patch > > > startLocalizer in DefaultContainerExecutor will only use the first localDir > to copy the token file, if the copy is failed for first localDir due to not > enough disk space in the first localDir, the localization will be failed even > there are plenty of disk space in other localDirs. We see the following error > for this case: > {code} > 2014-09-13 23:33:25,171 WARN > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to > create app directory > /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 > java.io.IOException: mkdir of > /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) > 2014-09-13 23:33:25,185 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer failed > java.io.FileNotFoundException: File > file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111) > at > org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76) > at > org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:344) > at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390) > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.create(FileContext.java:673) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) > 2014-09-13 23:33:25,186 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1410663092546_0004_01_01 transitioned from > LOCALIZING to LOCALIZATION_FAILED > 2014-09-13 23:33:25,187 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera > OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESU
[jira] [Updated] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)
[ https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-1708: - Fix Version/s: 2.6.0 > Add a public API to reserve resources (part of YARN-1051) > - > > Key: YARN-1708 > URL: https://issues.apache.org/jira/browse/YARN-1708 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Carlo Curino >Assignee: Subru Krishnan > Fix For: 2.6.0 > > Attachments: YARN-1708.patch, YARN-1708.patch, YARN-1708.patch, > YARN-1708.patch > > > This JIRA tracks the definition of a new public API for YARN, which allows > users to reserve resources (think of time-bounded queues). This is part of > the admission control enhancement proposed in YARN-1051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated
[ https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160653#comment-14160653 ] Jian He commented on YARN-2312: --- Patch looks good to me too, thanks Jason for reviewing the patch. bq. Wondering if there should be a utility method on ContainerId to provide this value or if the masking constant should be obtainable from ContainerId. I prefer exposing the constant > Marking ContainerId#getId as deprecated > --- > > Key: YARN-2312 > URL: https://issues.apache.org/jira/browse/YARN-2312 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, > YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, > YARN-2312.4.patch, YARN-2312.5.patch > > > {{ContainerId#getId}} will only return partial value of containerId, only > sequence number of container id without epoch, after YARN-2229. We should > mark {{ContainerId#getId}} as deprecated and use > {{ContainerId#getContainerId}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-1707: - Target Version/s: (was: 3.0.0) > Making the CapacityScheduler more dynamic > - > > Key: YARN-1707 > URL: https://issues.apache.org/jira/browse/YARN-1707 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Carlo Curino >Assignee: Carlo Curino > Labels: capacity-scheduler > Fix For: 2.6.0 > > Attachments: YARN-1707.10.patch, YARN-1707.2.patch, > YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, > YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch > > > The CapacityScheduler is a rather static at the moment, and refreshqueue > provides a rather heavy-handed way to reconfigure it. Moving towards > long-running services (tracked in YARN-896) and to enable more advanced > admission control and resource parcelling we need to make the > CapacityScheduler more dynamic. This is instrumental to the umbrella jira > YARN-1051. > Concretely this require the following changes: > * create queues dynamically > * destroy queues dynamically > * dynamically change queue parameters (e.g., capacity) > * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% > instead of ==100% > We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.
[ https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2566: Attachment: (was: YARN-2566.003.patch) > IOException happen in startLocalizer of DefaultContainerExecutor due to not > enough disk space for the first localDir. > - > > Key: YARN-2566 > URL: https://issues.apache.org/jira/browse/YARN-2566 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2566.000.patch, YARN-2566.001.patch, > YARN-2566.002.patch, YARN-2566.003.patch > > > startLocalizer in DefaultContainerExecutor will only use the first localDir > to copy the token file, if the copy is failed for first localDir due to not > enough disk space in the first localDir, the localization will be failed even > there are plenty of disk space in other localDirs. We see the following error > for this case: > {code} > 2014-09-13 23:33:25,171 WARN > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to > create app directory > /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 > java.io.IOException: mkdir of > /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) > 2014-09-13 23:33:25,185 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer failed > java.io.FileNotFoundException: File > file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111) > at > org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76) > at > org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:344) > at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390) > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.create(FileContext.java:673) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) > 2014-09-13 23:33:25,186 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1410663092546_0004_01_01 transitioned from > LOCALIZING to LOCALIZATION_FAILED > 2014-09-13 23:33:25,187 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera > OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: LOCALIZATION_FAILED > APPID=application_1410663092546_0004 > CONTAINERID=conta
[jira] [Updated] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.
[ https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2566: Attachment: YARN-2566.003.patch > IOException happen in startLocalizer of DefaultContainerExecutor due to not > enough disk space for the first localDir. > - > > Key: YARN-2566 > URL: https://issues.apache.org/jira/browse/YARN-2566 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: zhihai xu >Assignee: zhihai xu > Attachments: YARN-2566.000.patch, YARN-2566.001.patch, > YARN-2566.002.patch, YARN-2566.003.patch > > > startLocalizer in DefaultContainerExecutor will only use the first localDir > to copy the token file, if the copy is failed for first localDir due to not > enough disk space in the first localDir, the localization will be failed even > there are plenty of disk space in other localDirs. We see the following error > for this case: > {code} > 2014-09-13 23:33:25,171 WARN > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to > create app directory > /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 > java.io.IOException: mkdir of > /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) > 2014-09-13 23:33:25,185 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer failed > java.io.FileNotFoundException: File > file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501) > at > org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111) > at > org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76) > at > org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.(ChecksumFs.java:344) > at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390) > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677) > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at org.apache.hadoop.fs.FileContext.create(FileContext.java:673) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021) > at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987) > 2014-09-13 23:33:25,186 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1410663092546_0004_01_01 transitioned from > LOCALIZING to LOCALIZATION_FAILED > 2014-09-13 23:33:25,187 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera > OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: LOCALIZATION_FAILED > APPID=application_1410663092546_0004 > CONTAINERID=container_141066
[jira] [Updated] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels
[ https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2500: - Attachment: YARN-2500.patch Attached updated patch > [YARN-796] Miscellaneous changes in ResourceManager to support labels > - > > Key: YARN-2500 > URL: https://issues.apache.org/jira/browse/YARN-2500 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2500.patch, YARN-2500.patch, YARN-2500.patch, > YARN-2500.patch > > > This patches contains changes in ResourceManager to support labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2496: - Attachment: YARN-2496.patch Attached updated patch > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch, YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic
[ https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-1707: - Fix Version/s: 2.6.0 > Making the CapacityScheduler more dynamic > - > > Key: YARN-1707 > URL: https://issues.apache.org/jira/browse/YARN-1707 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Carlo Curino >Assignee: Carlo Curino > Labels: capacity-scheduler > Fix For: 2.6.0 > > Attachments: YARN-1707.10.patch, YARN-1707.2.patch, > YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, > YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch > > > The CapacityScheduler is a rather static at the moment, and refreshqueue > provides a rather heavy-handed way to reconfigure it. Moving towards > long-running services (tracked in YARN-896) and to enable more advanced > admission control and resource parcelling we need to make the > CapacityScheduler more dynamic. This is instrumental to the umbrella jira > YARN-1051. > Concretely this require the following changes: > * create queues dynamically > * destroy queues dynamically > * dynamically change queue parameters (e.g., capacity) > * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% > instead of ==100% > We limit this to LeafQueues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)