[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060379#comment-14060379 ] Jian He commented on YARN-1408: --- More comments after looking at the latest patch: - is it possible that schedulerAttempt here is null? e.g. preemption happens after the attempt completed. {code} SchedulerApplicationAttempt schedulerAttempt = getCurrentAttemptForContainer(rmContainer.getContainerId()); schedulerAttempt.recoverResourceRequests(requests); {code} - AbstractYarnScheduler#recoverResourceRequest, how about renaming to recoverResourceRequestForContainer ? - assert the size of the requests. it can be empty and the assertion will be skipped. similarly for CapacityScheduler test {code} List requests = rmContainer.getResourceRequests(); // Once recovered, resource request will be present again in app for (ResourceRequest request : requests) { Assert.assertEquals(1, app.getResourceRequest(priority, request.getResourceName()) .getNumContainers()); } {code} - Alternatively, calling warnOrKillContainer twice and setting WAIT_TIME_BEFORE_KILL to a small value may do the work. {code} // Create a preempt event by sending KILL event. In real cases, // FairScheduler#warnOrKillContainer will perform below steps. ContainerStatus status = SchedulerUtils.createPreemptedContainerStatus( rmContainer.getContainerId(), SchedulerUtils.PREEMPTED_CONTAINER); scheduler.recoverResourceRequest(rmContainer); app.containerCompleted(rmContainer, status, RMContainerEventType.KILL); {code} > Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task > timeout for 30mins > -- > > Key: YARN-1408 > URL: https://issues.apache.org/jira/browse/YARN-1408 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, > Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, Yarn-1408.5.patch, > Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, Yarn-1408.9.patch, > Yarn-1408.patch > > > Capacity preemption is enabled as follows. > * yarn.resourcemanager.scheduler.monitor.enable= true , > * > yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy > Queue = a,b > Capacity of Queue A = 80% > Capacity of Queue B = 20% > Step 1: Assign a big jobA on queue a which uses full cluster capacity > Step 2: Submitted a jobB to queue b which would use less than 20% of cluster > capacity > JobA task which uses queue b capcity is been preempted and killed. > This caused below problem: > 1. New Container has got allocated for jobA in Queue A as per node update > from an NM. > 2. This container has been preempted immediately as per preemption. > Here ACQUIRED at KILLED Invalid State exception came when the next AM > heartbeat reached RM. > ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ACQUIRED at KILLED > This also caused the Task to go for a timeout for 30minutes as this Container > was already killed by preemption. > attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext
[ https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060373#comment-14060373 ] Hadoop QA commented on YARN-2130: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655497/YARN-2130.7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 17 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4290//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4290//console This message is automatically generated. > Cleanup: Adding getRMAppManager, getQueueACLsManager, > getApplicationACLsManager to RMContext > > > Key: YARN-2130 > URL: https://issues.apache.org/jira/browse/YARN-2130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, > YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch, YARN-2130.7.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2234) Incorrect description in RM audit logs while refreshing Admin ACL
[ https://issues.apache.org/jira/browse/YARN-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060366#comment-14060366 ] Kenji Kikushima commented on YARN-2234: --- This patch contains log message modification only. I think no additional test needed. And I tried failed tests locally, no error occurred. {noformat} Running org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.42 sec - in org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices Running org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps Tests run: 40, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 168.308 sec - in org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps Running org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.725 sec - in org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler Running org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 75.733 sec - in org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification Running org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes Tests run: 22, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.253 sec - in org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes Running org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.783 sec - in org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched Results : Tests run: 105, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 4:51.743s [INFO] Finished at: Mon Jul 14 14:56:46 UTC 2014 [INFO] Final Memory: 31M/375M [INFO] {noformat} > Incorrect description in RM audit logs while refreshing Admin ACL > - > > Key: YARN-2234 > URL: https://issues.apache.org/jira/browse/YARN-2234 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Varun Saxena >Assignee: Kenji Kikushima > Attachments: YARN-2234.patch > > > In method, AdminService#refreshAdminAcls(AdminService.java:446), failure RM > audit log, which is generated when RM is not active, has following > description : > "ResourceManager is not active. Can not refresh user-groups." > This should instead be changed to "ResourceManager is not active. Can not > refresh admin ACLs'". -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext
[ https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2130: - Attachment: YARN-2130.7.patch Rebased on trunk. > Cleanup: Adding getRMAppManager, getQueueACLsManager, > getApplicationACLsManager to RMContext > > > Key: YARN-2130 > URL: https://issues.apache.org/jira/browse/YARN-2130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, > YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch, YARN-2130.7.patch > > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2234) Incorrect description in RM audit logs while refreshing Admin ACL
[ https://issues.apache.org/jira/browse/YARN-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060341#comment-14060341 ] Hadoop QA commented on YARN-2234: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655491/YARN-2234.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4289//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4289//console This message is automatically generated. > Incorrect description in RM audit logs while refreshing Admin ACL > - > > Key: YARN-2234 > URL: https://issues.apache.org/jira/browse/YARN-2234 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Varun Saxena >Assignee: Kenji Kikushima > Attachments: YARN-2234.patch > > > In method, AdminService#refreshAdminAcls(AdminService.java:446), failure RM > audit log, which is generated when RM is not active, has following > description : > "ResourceManager is not active. Can not refresh user-groups." > This should instead be changed to "ResourceManager is not active. Can not > refresh admin ACLs'". -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2234) Incorrect description in RM audit logs while refreshing Admin ACL
[ https://issues.apache.org/jira/browse/YARN-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima reassigned YARN-2234: - Assignee: Kenji Kikushima > Incorrect description in RM audit logs while refreshing Admin ACL > - > > Key: YARN-2234 > URL: https://issues.apache.org/jira/browse/YARN-2234 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Varun Saxena >Assignee: Kenji Kikushima > Attachments: YARN-2234.patch > > > In method, AdminService#refreshAdminAcls(AdminService.java:446), failure RM > audit log, which is generated when RM is not active, has following > description : > "ResourceManager is not active. Can not refresh user-groups." > This should instead be changed to "ResourceManager is not active. Can not > refresh admin ACLs'". -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2234) Incorrect description in RM audit logs while refreshing Admin ACL
[ https://issues.apache.org/jira/browse/YARN-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-2234: -- Attachment: YARN-2234.patch Attached a patch. Changed audit log message to "ResourceManager is not active. Can not refresh admin ACLs." > Incorrect description in RM audit logs while refreshing Admin ACL > - > > Key: YARN-2234 > URL: https://issues.apache.org/jira/browse/YARN-2234 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: Varun Saxena > Attachments: YARN-2234.patch > > > In method, AdminService#refreshAdminAcls(AdminService.java:446), failure RM > audit log, which is generated when RM is not active, has following > description : > "ResourceManager is not active. Can not refresh user-groups." > This should instead be changed to "ResourceManager is not active. Can not > refresh admin ACLs'". -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060258#comment-14060258 ] Yuliya Feldman commented on YARN-796: - 1) {quote} Agree, what I meant is, we need consider performance of 2 things, - Time to evaluate a label expression, IMO we need to add labels in per container level. - If it is important to get headroom or how many nodes can be used for an expression. The easier expression will be easier for us to get result mentioned previously easier. {quote} Regarding time to evaluate label expression - we need to get some performance stats on how many ops we can process - I will try to get those performance numbers based different levels complexity of expression Did not do anything to include labels evaluation into calculation of headroom, so I don't have comments there 2) bq. Do you have any ideas about what’s the API will like? It can be as simple as "yarn rmadmin -loadlabels " I am not sure if you mean anything else 3) bq. I think for different schedulers, we should specify queue related parameters in different configurations. Let’s get more ideas about how to specify queue parameters from community before move ahead. I have some examples in the document for Fair and Capacity Schedulers > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins
[ https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060138#comment-14060138 ] Wangda Tan commented on YARN-1408: -- LGTM, +1 Thanks, > Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task > timeout for 30mins > -- > > Key: YARN-1408 > URL: https://issues.apache.org/jira/browse/YARN-1408 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.2.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: Yarn-1408.1.patch, Yarn-1408.10.patch, > Yarn-1408.2.patch, Yarn-1408.3.patch, Yarn-1408.4.patch, Yarn-1408.5.patch, > Yarn-1408.6.patch, Yarn-1408.7.patch, Yarn-1408.8.patch, Yarn-1408.9.patch, > Yarn-1408.patch > > > Capacity preemption is enabled as follows. > * yarn.resourcemanager.scheduler.monitor.enable= true , > * > yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy > Queue = a,b > Capacity of Queue A = 80% > Capacity of Queue B = 20% > Step 1: Assign a big jobA on queue a which uses full cluster capacity > Step 2: Submitted a jobB to queue b which would use less than 20% of cluster > capacity > JobA task which uses queue b capcity is been preempted and killed. > This caused below problem: > 1. New Container has got allocated for jobA in Queue A as per node update > from an NM. > 2. This container has been preempted immediately as per preemption. > Here ACQUIRED at KILLED Invalid State exception came when the next AM > heartbeat reached RM. > ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ACQUIRED at KILLED > This also caused the Task to go for a timeout for 30minutes as this Container > was already killed by preemption. > attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060133#comment-14060133 ] Wangda Tan commented on YARN-796: - Reply: Hi Yuliya, Thanks for your reply. it’s great to read your doc and discuss with you too. :) Please see my reply below. 1) bq. What probably needs to be evaluated is what nodes satisfy a final/effective LabelExpression, as nodes can come and go, labels on them can change Agree, what I meant is, we need consider performance of 2 things, * Time to evaluate a label expression, IMO we need to add labels in per container level. * If it is important to get headroom or how many nodes can be used for an expression. The easier expression will be easier for us to get result mentioned previously easier. 2) bq. Let me understand it better: If application provides multiple labels they are "AND"ed and so only nodes that have the same set of labels or their superset will be used? Yes, Why I think this is important because label is treat as a tangible resource here. Imaging you running a HBase master, you may want the node is “stable”, “large_memory”, “for_long_running_service”. Or you try to run a scientific computing program, you want a node has “GPU”, “large_memory”, “strong_cpu”. It is not make sense to use “OR” in these cases. To Sandy/Amit, do you have any specific use case for OR? My basic feeling to support different OPs like “OR”/“NOT” here is, we may support different OPs if they have clear use case and highly demanded. But we’d better not use a combined expression. If we use combined expression, we need to add parentheses here, which will increase complexity to evaluate them. Let's hear more thoughts from community about this. 3) bq. Yes - so far this is a procedure. Not sure what is "hard" here, but we can have some API to do it. Do you have any ideas about what’s the API will like? 4) bq. Agree - that today this file may be only relevant to RM. If it is stored as local file or by other means it is greater chance for it to be overwritten, lost in upgrade process. Agree 5) bq. And if we support this, it will be not sufficient to change isBlackListed at AppSchedulingInfo only in scheduler to make fair/capacity scheduler works. We may need to modify implementations of different schedulers. Agree 6) bq. Sure we can make them consistent, our thought process was that if you have multiple leaf queues that should share the same label/policy you can specify it on the parent level, so you don't need to "type" more then necessary I think for different schedulers, we should specify queue related parameters in different configurations. Let’s get more ideas about how to specify queue parameters from community before move ahead. :) Thanks, Wangda > Allow for (admin) labels on nodes and resource-requests > --- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations
[ https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060129#comment-14060129 ] Hudson commented on YARN-2274: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1830 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1830/]) YARN-2274. FairScheduler: Add debug information about cluster capacity, availability and reservations. (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609942) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java > FairScheduler: Add debug information about cluster capacity, availability and > reservations > -- > > Key: YARN-2274 > URL: https://issues.apache.org/jira/browse/YARN-2274 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.4.1 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Trivial > Fix For: 2.6.0 > > Attachments: yarn-2274-1.patch, yarn-2274-2.patch, yarn-2274-3.patch > > > FairScheduler logs have little information on cluster capacity and > availability. Need this information to debug production issues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations
[ https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060112#comment-14060112 ] Hudson commented on YARN-2274: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1803 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1803/]) YARN-2274. FairScheduler: Add debug information about cluster capacity, availability and reservations. (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609942) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java > FairScheduler: Add debug information about cluster capacity, availability and > reservations > -- > > Key: YARN-2274 > URL: https://issues.apache.org/jira/browse/YARN-2274 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.4.1 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Trivial > Fix For: 2.6.0 > > Attachments: yarn-2274-1.patch, yarn-2274-2.patch, yarn-2274-3.patch > > > FairScheduler logs have little information on cluster capacity and > availability. Need this information to debug production issues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations
[ https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060074#comment-14060074 ] Hudson commented on YARN-2274: -- FAILURE: Integrated in Hadoop-Yarn-trunk #611 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/611/]) YARN-2274. FairScheduler: Add debug information about cluster capacity, availability and reservations. (kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1609942) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java > FairScheduler: Add debug information about cluster capacity, availability and > reservations > -- > > Key: YARN-2274 > URL: https://issues.apache.org/jira/browse/YARN-2274 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.4.1 >Reporter: Karthik Kambatla >Assignee: Karthik Kambatla >Priority: Trivial > Fix For: 2.6.0 > > Attachments: yarn-2274-1.patch, yarn-2274-2.patch, yarn-2274-3.patch > > > FairScheduler logs have little information on cluster capacity and > availability. Need this information to debug production issues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart
[ https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14060041#comment-14060041 ] Tsuyoshi OZAWA commented on YARN-2229: -- The test failure is not related, and javac warning is caused by using {{getId}}. > ContainerId can overflow with RM restart > > > Key: YARN-2229 > URL: https://issues.apache.org/jira/browse/YARN-2229 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: YARN-2229.1.patch, YARN-2229.2.patch, YARN-2229.2.patch, > YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, > YARN-2229.7.patch, YARN-2229.8.patch > > > On YARN-2052, we changed containerId format: upper 10 bits are for epoch, > lower 22 bits are for sequence number of Ids. This is for preserving > semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, > {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and > {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM > restarts 1024 times. > To avoid the problem, its better to make containerId long. We need to define > the new format of container Id with preserving backward compatibility on this > JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)