[jira] [Commented] (YARN-4102) Add a "skip existing table" mode for timeline schema creator
[ https://issues.apache.org/jira/browse/YARN-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740238#comment-14740238 ] Hadoop QA commented on YARN-4102: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 7s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 10s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 20s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 17s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 51s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 1m 35s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 40m 0s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12755269/YARN-4102-YARN-2928.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / e6afe26 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/9085/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9085/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9085/console | This message was automatically generated. > Add a "skip existing table" mode for timeline schema creator > > > Key: YARN-4102 > URL: https://issues.apache.org/jira/browse/YARN-4102 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4102-YARN-2928.001.patch, > YARN-4102-YARN-2928.002.patch, YARN-4102-YARN-2928.003.patch, > YARN-4102-YARN-2928.004.patch > > > When debugging timeline POCs, we may need to create hbase tables that are > added in some ongoing patches. Right now, our schema creator will exit when > it hits one existing table. While this is a correct behavior with end users, > this introduces much trouble in debugging POCs: every time we have to disable > all existing tables, drop them, run the schema creator to generate all > tables, and regenerate all test data. > Maybe we'd like to add an "incremental" mode so that the creator will only > create non-existing tables? This is pretty handy in deploying our POCs. Of > course, consistency has to be kept in mind across tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740221#comment-14740221 ] Bibin A Chundatt commented on YARN-4126: Hi [~jianhe] The test classes required updation were as below TestClientRMService TestRMDelegation TokensTestRMWebServicesDelegationTokens All testcases related to token were written in non secure mode.After the change made in fix for disabling token in non secure mode all token related test cases started failing.Because as per the new condition only in secure mode & AuthenticationMethod=kerberos will get the delegation token. Also {{TestClientRMService}} contains testcases with non secure mode(12) and secure(8) mode. For testcases related to token had to create state *Kerberos+Secured* . Method *initializeUserGroupSecureMode* was created for the same in which authentication Method as kerberos is set for {{UserGroupInformation}}. *UserGroupInformation* state had to be set only for few test cases so initialization in before class was not a choice . Any other suggestion if you are having please do share, will try my best . > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, > 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740214#comment-14740214 ] Hadoop QA commented on YARN-2005: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 29s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 13s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 27s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 27s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 39s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 1s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 54m 33s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 102m 1s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12755286/YARN-2005.009.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f103a70 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9084/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9084/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9084/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9084/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9084/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9084/console | This message was automatically generated. > Blacklisting support for scheduling AMs > --- > > Key: YARN-2005 > URL: https://issues.apache.org/jira/browse/YARN-2005 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Anubhav Dhoot > Attachments: YARN-2005.001.patch, YARN-2005.002.patch, > YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, > YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, > YARN-2005.008.patch, YARN-2005.009.patch > > > It would be nice if the RM supported blacklisting a node for an AM launch > after the same node fails a configurable number of AM attempts. This would > be similar to the blacklisting support for scheduling task attempts in the > MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.
[ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740114#comment-14740114 ] Hadoop QA commented on YARN-1651: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 19s | Findbugs (version ) appears to be broken on YARN-1197. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 23 new or modified test files. | | {color:red}-1{color} | javac | 8m 18s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | javadoc | 11m 39s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 30s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 43m 12s | The patch has 177 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 48s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 46s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 9m 13s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 9m 52s | Tests passed in hadoop-mapreduce-client-app. | | {color:green}+1{color} | tools/hadoop tests | 1m 0s | Tests passed in hadoop-sls. | | {color:red}-1{color} | yarn tests | 6m 43s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-server-common. | | {color:green}+1{color} | yarn tests | 55m 30s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 171m 51s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-common | | Failed unit tests | hadoop.yarn.client.api.impl.TestYarnClient | | Timed out tests | org.apache.hadoop.yarn.client.api.impl.TestNMClient | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12755257/YARN-1651-6.YARN-1197.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-1197 / f86eae1 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/diffJavacWarnings.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html | | hadoop-mapreduce-client-app test log | https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt | | hadoop-sls test log | https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/testrun_hadoop-sls.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9079/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9079/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9079/console | This message was automatically generated. > CapacityScheduler side changes to support increase/decrease container > resource. > --- > > Key: YARN-1651 > URL: https://issues.apache.org/jira/browse/YARN-1651 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-1651-1.YARN-1197.patch, > YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, > YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch, > YARN-1651-6.YARN-1197.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740093#comment-14740093 ] Jonathan Eagles commented on YARN-2513: --- I'll need to have a look at this to see why two UI's don't work. > Host framework UIs in YARN for use with the ATS > --- > > Key: YARN-2513 > URL: https://issues.apache.org/jira/browse/YARN-2513 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Labels: 2.6.1-candidate > Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, > YARN-2513.v3.patch > > > Allow for pluggable UIs as described by TEZ-8. Yarn can provide the > infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4145) Make RMHATestBase abstract so its not run when running all tests under that namespace
[ https://issues.apache.org/jira/browse/YARN-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740069#comment-14740069 ] Hadoop QA commented on YARN-4145: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 7m 2s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 6s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 54s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 54m 4s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 74m 4s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | | | org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12755259/YARN-4145.001.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / f103a70 | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9081/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9081/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9081/console | This message was automatically generated. > Make RMHATestBase abstract so its not run when running all tests under that > namespace > - > > Key: YARN-4145 > URL: https://issues.apache.org/jira/browse/YARN-4145 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Minor > Attachments: YARN-4145.001.patch > > > Make it abstract to avoid running it as a test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4120) FSAppAttempt.getResourceUsage() should not take preemptedResource into account
[ https://issues.apache.org/jira/browse/YARN-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739974#comment-14739974 ] Xianyin Xin commented on YARN-4120: --- Link to YARN-4134, the two can be solved together. > FSAppAttempt.getResourceUsage() should not take preemptedResource into account > -- > > Key: YARN-4120 > URL: https://issues.apache.org/jira/browse/YARN-4120 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Xianyin Xin > > When compute resource usage for Schedulables, the following code is envolved, > {{FSAppAttempt.getResourceUsage}}, > {code} > public Resource getResourceUsage() { > return Resources.subtract(getCurrentConsumption(), getPreemptedResources()); > } > {code} > and this value is aggregated to FSLeafQueues and FSParentQueues. In my > opinion, taking {{preemptedResource}} into account here is not reasonable, > there are two main reasons, > # it is something in future, i.e., even though these resources are marked as > preempted, it is currently used by app, and these resources will be > subtracted from {{currentCosumption}} once the preemption is finished. it's > not reasonable to make arrange for it ahead of time. > # there's another problem here, consider following case, > {code} > root >/\ > queue1 queue2 > /\ > queue1.3, queue1.4 > {code} > suppose queue1.3 need resource and it can preempt resources from queue1.4, > the preemption happens in the interior of queue1. But when compute resource > usage of queue1, {{queue1.resourceUsage = it's_current_resource_usage - > preemption}} according to the current code, which is unfair to queue2 when > doing resource allocating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs
[ https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-3985: Attachment: YARN-3985.001.patch Added a patch that calls into state store and unit test that verifies after recovery state the new RM gets the reservations saved from previous RM. > Make ReservationSystem persist state using RMStateStore reservation APIs > - > > Key: YARN-3985 > URL: https://issues.apache.org/jira/browse/YARN-3985 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3985.001.patch > > > YARN-3736 adds the RMStateStore apis to store and load reservation state. > This jira adds the actual storing of state from ReservationSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3985) Make ReservationSystem persist state using RMStateStore reservation APIs
[ https://issues.apache.org/jira/browse/YARN-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739939#comment-14739939 ] Anubhav Dhoot commented on YARN-3985: - Since updateReservation does an add and remove we do not have need to update reservation state in the state store. I can remove it if needed in either this or a separate patch. > Make ReservationSystem persist state using RMStateStore reservation APIs > - > > Key: YARN-3985 > URL: https://issues.apache.org/jira/browse/YARN-3985 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-3985.001.patch > > > YARN-3736 adds the RMStateStore apis to store and load reservation state. > This jira adds the actual storing of state from ReservationSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2005: Attachment: YARN-2005.009.patch Addressed feedback > Blacklisting support for scheduling AMs > --- > > Key: YARN-2005 > URL: https://issues.apache.org/jira/browse/YARN-2005 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Anubhav Dhoot > Attachments: YARN-2005.001.patch, YARN-2005.002.patch, > YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, > YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, > YARN-2005.008.patch, YARN-2005.009.patch > > > It would be nice if the RM supported blacklisting a node for an AM launch > after the same node fails a configurable number of AM attempts. This would > be similar to the blacklisting support for scheduling task attempts in the > MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4102) Add a "skip existing table" mode for timeline schema creator
[ https://issues.apache.org/jira/browse/YARN-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739910#comment-14739910 ] Sangjin Lee commented on YARN-4102: --- The latest patch (v.4) LGTM. Once jenkins is green, I'll commit it. > Add a "skip existing table" mode for timeline schema creator > > > Key: YARN-4102 > URL: https://issues.apache.org/jira/browse/YARN-4102 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4102-YARN-2928.001.patch, > YARN-4102-YARN-2928.002.patch, YARN-4102-YARN-2928.003.patch, > YARN-4102-YARN-2928.004.patch > > > When debugging timeline POCs, we may need to create hbase tables that are > added in some ongoing patches. Right now, our schema creator will exit when > it hits one existing table. While this is a correct behavior with end users, > this introduces much trouble in debugging POCs: every time we have to disable > all existing tables, drop them, run the schema creator to generate all > tables, and regenerate all test data. > Maybe we'd like to add an "incremental" mode so that the creator will only > create non-existing tables? This is pretty handy in deploying our POCs. Of > course, consistency has to be kept in mind across tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4102) Add a "skip existing table" mode for timeline schema creator
[ https://issues.apache.org/jira/browse/YARN-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4102: Attachment: YARN-4102-YARN-2928.004.patch Sorry for the delay. Here's the updated patch. Thanks folks! > Add a "skip existing table" mode for timeline schema creator > > > Key: YARN-4102 > URL: https://issues.apache.org/jira/browse/YARN-4102 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4102-YARN-2928.001.patch, > YARN-4102-YARN-2928.002.patch, YARN-4102-YARN-2928.003.patch, > YARN-4102-YARN-2928.004.patch > > > When debugging timeline POCs, we may need to create hbase tables that are > added in some ongoing patches. Right now, our schema creator will exit when > it hits one existing table. While this is a correct behavior with end users, > this introduces much trouble in debugging POCs: every time we have to disable > all existing tables, drop them, run the schema creator to generate all > tables, and regenerate all test data. > Maybe we'd like to add an "incremental" mode so that the creator will only > create non-existing tables? This is pretty handy in deploying our POCs. Of > course, consistency has to be kept in mind across tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3901) Populate flow run data in the flow_run & flow activity tables
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C updated YARN-3901: - Attachment: YARN-3901-YARN-2928.6.patch Uploading patch v6 that addresses [~jrottinghuis] 's and [~sjlee0] 's discussion points about timestamp values being in milliseconds/nanoseconds. Each cell will now have a timestamp that will be multiplied with 1000. The timestamp of the cells in the flow run table will also include the last 3 digits of the appId' id. That way we take care of collisions in this table. The read function ColumnHelper#readResultsWithTimestamps function accordingly truncates the last 3 digits in the cell timestamp value. I checked that all the tests in timelineservice are passing. > Populate flow run data in the flow_run & flow activity tables > - > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3901-YARN-2928.1.patch, > YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, > YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch, > YARN-3901-YARN-2928.6.patch > > > As per the schema proposed in YARN-3815 in > https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run table. > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow version > RM’s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency > than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, even > if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and > decremented on app completion. > - For min_start_time the RM writer will simply write a value with the tag for > the applicationId. A coprocessor will return the min value of all written > values. - > - Upon flush and compactions, the min value between all the cells of this > column will be written to the cell without any tag (empty tag) and all the > other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or can > indicate running (1) or complete (2). In those cases (for metrics) only > complete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications are > completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the flow > numbers are retained in a separate column for historical tracking: we don’t > want to re-aggregate for those upon replay > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1644) RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739800#comment-14739800 ] MENG DING commented on YARN-1644: - Hi, [~leftnoteasy] There are 7 findbugs warnings, but they already existed before the patch. This patch does not generate new findbugs warnings. I took a quick look at some of the warnings: * The warnings in {{NodeStatusPBImpl}} are most likely because getContainersUtilization/setContainersUtilization/getNodeUtilization/setNodeUtilization are not synchronized * The warnings in {{WebServices}} are probably because of potential NPE. I will open a ticket to fix them. Meng > RM-NM protocol changes and NodeStatusUpdater implementation to support > container resizing > - > > Key: YARN-1644 > URL: https://issues.apache.org/jira/browse/YARN-1644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Wangda Tan >Assignee: MENG DING > Fix For: YARN-1197 > > Attachments: YARN-1644-YARN-1197.4.patch, > YARN-1644-YARN-1197.5.patch, YARN-1644-YARN-1197.6.patch, YARN-1644.1.patch, > YARN-1644.2.patch, YARN-1644.3.patch, yarn-1644.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4145) Make RMHATestBase abstract so its not run when running all tests under that namespace
[ https://issues.apache.org/jira/browse/YARN-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4145: Attachment: YARN-4145.001.patch > Make RMHATestBase abstract so its not run when running all tests under that > namespace > - > > Key: YARN-4145 > URL: https://issues.apache.org/jira/browse/YARN-4145 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Minor > Attachments: YARN-4145.001.patch > > > Trivial patch to make it abstract -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4145) Make RMHATestBase abstract so its not run when running all tests under that namespace
[ https://issues.apache.org/jira/browse/YARN-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-4145: Description: Make it abstract to avoid running it as a test (was: Trivial patch to make it abstract) > Make RMHATestBase abstract so its not run when running all tests under that > namespace > - > > Key: YARN-4145 > URL: https://issues.apache.org/jira/browse/YARN-4145 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot >Priority: Minor > Attachments: YARN-4145.001.patch > > > Make it abstract to avoid running it as a test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4145) Make RMHATestBase abstract so its not run when running all tests under that namespace
Anubhav Dhoot created YARN-4145: --- Summary: Make RMHATestBase abstract so its not run when running all tests under that namespace Key: YARN-4145 URL: https://issues.apache.org/jira/browse/YARN-4145 Project: Hadoop YARN Issue Type: Improvement Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Minor Trivial patch to make it abstract -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.
[ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1651: - Attachment: (was: YARN-1651-6.YARN-1197.patch) > CapacityScheduler side changes to support increase/decrease container > resource. > --- > > Key: YARN-1651 > URL: https://issues.apache.org/jira/browse/YARN-1651 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-1651-1.YARN-1197.patch, > YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, > YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch, > YARN-1651-6.YARN-1197.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.
[ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1651: - Attachment: YARN-1651-6.YARN-1197.patch Found a test failure in ver.6 patch, removed/added the patch before anybody looking at it... > CapacityScheduler side changes to support increase/decrease container > resource. > --- > > Key: YARN-1651 > URL: https://issues.apache.org/jira/browse/YARN-1651 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-1651-1.YARN-1197.patch, > YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, > YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch, > YARN-1651-6.YARN-1197.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.
[ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1651: - Attachment: YARN-1651-6.YARN-1197.patch Uploaded ver.6 patch. > CapacityScheduler side changes to support increase/decrease container > resource. > --- > > Key: YARN-1651 > URL: https://issues.apache.org/jira/browse/YARN-1651 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-1651-1.YARN-1197.patch, > YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, > YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch, > YARN-1651-6.YARN-1197.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.
[ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739742#comment-14739742 ] Wangda Tan commented on YARN-1651: -- Thanks review! [~jianhe]. bq. I think for now we can fail the allocate call explicitly on those very clear situations in checkAndNormalizeContainerChangeRequest ?, e.g. the situation that rmContainer doesn't exist That's more explicit to users. Digging through logs is not an easy thing for application writer. Done, now we will check it in both AMS/Scheduler, exception will be thrown in AMS. Doing both check because AMS doesn't acquire scheduler lock, so it is still possbile that RMContainer state changed when adding to scheduler. bq. RMNodeImpl#toBeDecreasedContainers - no need to be a map, it can be a list ? and therefore NodeHeartBeatResponse and Impl change is not needed; similarly nmReportedIncreasedContainers can be a list. This is to avoid AM decrease same container multiple times between same NM heartbeats, this is a rare edge case. Similar for NM reports increasedContainers, if we decouple NM heartbeat and scheduler allocation, we could have container increased multiple times between scheduler looks at NM. bq. When decreasing a container, should it send RMNodeDecreaseContainerEvent too ? Done, added test to confirm this as well. bq. looks like when decreasing reservedIncreasedContainer, it will unreserve the whole extra reserved resource, should it only unreserve the extra resources being decresed ? Decrease container is decrease resource of a container to lower than confirmed resource. If a container is 2G, AM asks to increase to 4G, it can only decrease it to less than 2G before increase issued. So I think we need to unreserve the whole extra reserved resource. bq. In general, I think we should be able to decrease/increase a regular reserved container or a increasedReservedContainer ? Container reservation is an internal state of scheduler, AM doesn't know about the reserved container at know, so far I think we don't need to expose that to user. bq. allocate call is specifically marked as noLock, but now every allocate call holds the global scheduler lock which is too expensive. we can move decreaseContainer to application itself. DecreaseContainer is as same as completedContainer, both acquire scheduler lock and queue lock. I think we can optimize it in the future, which we can add them to something like "pendingReleased" list, and will be traversed periodically. I added comments to CS#allocate to explain about this, the "NoLock" is not 100% acurate. And addressed all other comments. [~mding] Comment addressed. > CapacityScheduler side changes to support increase/decrease container > resource. > --- > > Key: YARN-1651 > URL: https://issues.apache.org/jira/browse/YARN-1651 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-1651-1.YARN-1197.patch, > YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, > YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch, > YARN-1651-6.YARN-1197.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3975) WebAppProxyServlet should not redirect to RM page if AHS is enabled
[ https://issues.apache.org/jira/browse/YARN-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-3975: Attachment: YARN-3975.7.patch [~jlowe] Thanks for taking a look. I have updated the patch and incorporated your comments. Can you please have another look? > WebAppProxyServlet should not redirect to RM page if AHS is enabled > --- > > Key: YARN-3975 > URL: https://issues.apache.org/jira/browse/YARN-3975 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-3975.2.b2.patch, YARN-3975.3.patch, > YARN-3975.4.patch, YARN-3975.5.patch, YARN-3975.6.patch, YARN-3975.7.patch > > > WebAppProxyServlet should be updated to handle the case when the appreport > doesn't have a tracking URL and the Application History Server is eanbled. > As we would have already tried the RM and got the > ApplicationNotFoundException we should not direct the user to the RM app page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739708#comment-14739708 ] Anubhav Dhoot commented on YARN-2005: - Added YARN-4144 to add the node that causes LaunchFailedTransition also to the AM blacklist > Blacklisting support for scheduling AMs > --- > > Key: YARN-2005 > URL: https://issues.apache.org/jira/browse/YARN-2005 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Anubhav Dhoot > Attachments: YARN-2005.001.patch, YARN-2005.002.patch, > YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, > YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, > YARN-2005.008.patch > > > It would be nice if the RM supported blacklisting a node for an AM launch > after the same node fails a configurable number of AM attempts. This would > be similar to the blacklisting support for scheduling task attempts in the > MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4144) Add NM that causes LaunchFailedTransition to blacklist
Anubhav Dhoot created YARN-4144: --- Summary: Add NM that causes LaunchFailedTransition to blacklist Key: YARN-4144 URL: https://issues.apache.org/jira/browse/YARN-4144 Project: Hadoop YARN Issue Type: Improvement Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot During discussion of YARN-2005 we need to add more cases where blacklisting can occur. This tracks adding any failures in launch via LaunchFailedTransition to also contribute to blacklisting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739700#comment-14739700 ] Anubhav Dhoot commented on YARN-2005: - [~sunilg] thats a good suggestion. Added a followup for this YARN-4143 > Blacklisting support for scheduling AMs > --- > > Key: YARN-2005 > URL: https://issues.apache.org/jira/browse/YARN-2005 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Anubhav Dhoot > Attachments: YARN-2005.001.patch, YARN-2005.002.patch, > YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, > YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, > YARN-2005.008.patch > > > It would be nice if the RM supported blacklisting a node for an AM launch > after the same node fails a configurable number of AM attempts. This would > be similar to the blacklisting support for scheduling task attempts in the > MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4143) Optimize the check for AMContainer allocation needed by blacklisting and ContainerType
Anubhav Dhoot created YARN-4143: --- Summary: Optimize the check for AMContainer allocation needed by blacklisting and ContainerType Key: YARN-4143 URL: https://issues.apache.org/jira/browse/YARN-4143 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot In YARN-2005 there are checks made to determine if the allocation is for an AM container. This happens in every allocate call and should be optimized away since it changes only once per SchedulerApplicationAttempt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4143) Optimize the check for AMContainer allocation needed by blacklisting and ContainerType
[ https://issues.apache.org/jira/browse/YARN-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-4143: --- Assignee: Anubhav Dhoot > Optimize the check for AMContainer allocation needed by blacklisting and > ContainerType > -- > > Key: YARN-4143 > URL: https://issues.apache.org/jira/browse/YARN-4143 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > > In YARN-2005 there are checks made to determine if the allocation is for an > AM container. This happens in every allocate call and should be optimized > away since it changes only once per SchedulerApplicationAttempt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1644) RM-NM protocol changes and NodeStatusUpdater implementation to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739679#comment-14739679 ] Wangda Tan commented on YARN-1644: -- [~mding], could you take a look at findbugs? I can reproduce it locally. You can run "mvn clean findbugs:findbugs" under yarn-server-common. Please open a ticket to track the findbugs fix if you can reproduce it. > RM-NM protocol changes and NodeStatusUpdater implementation to support > container resizing > - > > Key: YARN-1644 > URL: https://issues.apache.org/jira/browse/YARN-1644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Wangda Tan >Assignee: MENG DING > Fix For: YARN-1197 > > Attachments: YARN-1644-YARN-1197.4.patch, > YARN-1644-YARN-1197.5.patch, YARN-1644-YARN-1197.6.patch, YARN-1644.1.patch, > YARN-1644.2.patch, YARN-1644.3.patch, yarn-1644.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739642#comment-14739642 ] Hudson commented on YARN-4106: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #354 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/354/]) YARN-4106. NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM. (Bibin A Chundatt via wangda) (wangda: rev 77666105b4557d5706e5844a4ca286917d966c5f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Fix For: 2.8.0 > > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739625#comment-14739625 ] Hudson commented on YARN-4106: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2293 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2293/]) YARN-4106. NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM. (Bibin A Chundatt via wangda) (wangda: rev 77666105b4557d5706e5844a4ca286917d966c5f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Fix For: 2.8.0 > > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739618#comment-14739618 ] Anubhav Dhoot commented on YARN-2005: - [~He Tianyi] yes we are using the ContainerExitStatus in this. We can refine the conditions in a followup if needed. > Blacklisting support for scheduling AMs > --- > > Key: YARN-2005 > URL: https://issues.apache.org/jira/browse/YARN-2005 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Anubhav Dhoot > Attachments: YARN-2005.001.patch, YARN-2005.002.patch, > YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, > YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, > YARN-2005.008.patch > > > It would be nice if the RM supported blacklisting a node for an AM launch > after the same node fails a configurable number of AM attempts. This would > be similar to the blacklisting support for scheduling task attempts in the > MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739614#comment-14739614 ] Anubhav Dhoot commented on YARN-2005: - Hi [~kasha] thanks for your comments. 2.4 - we do not need to update the systemBlacklist as its updated by the RMAppAttemptImpl#ScheduleTransition call every time to the complete list. 11, 12 - The changes were needed because now we need a valid submission context for isWaitingForAMContainer. 9 - Is needed by the new test added in TestAMRestart. 8.3 - Yes i can file a follow up for that Addressed rest of them > Blacklisting support for scheduling AMs > --- > > Key: YARN-2005 > URL: https://issues.apache.org/jira/browse/YARN-2005 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 0.23.10, 2.4.0 >Reporter: Jason Lowe >Assignee: Anubhav Dhoot > Attachments: YARN-2005.001.patch, YARN-2005.002.patch, > YARN-2005.003.patch, YARN-2005.004.patch, YARN-2005.005.patch, > YARN-2005.006.patch, YARN-2005.006.patch, YARN-2005.007.patch, > YARN-2005.008.patch > > > It would be nice if the RM supported blacklisting a node for an AM launch > after the same node fails a configurable number of AM attempts. This would > be similar to the blacklisting support for scheduling task attempts in the > MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3700) ATS Web Performance issue at load time when large number of jobs
[ https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3700: -- Attachment: YARN-3700-branch-2.7.2.txt Attaching 2.7.2 patch that I committed. > ATS Web Performance issue at load time when large number of jobs > > > Key: YARN-3700 > URL: https://issues.apache.org/jira/browse/YARN-3700 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > Labels: 2.6.1-candidate, 2.7.2-candidate > Fix For: 2.6.1, 2.8.0, 2.7.2 > > Attachments: YARN-3700-branch-2.6.1.txt, YARN-3700-branch-2.7.2.txt, > YARN-3700.1.patch, YARN-3700.2.1.patch, YARN-3700.2.2.patch, > YARN-3700.2.patch, YARN-3700.3.patch, YARN-3700.4.patch > > > Currently, we will load all the apps when we try to load the yarn > timelineservice web page. If we have large number of jobs, it will be very > slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3700) ATS Web Performance issue at load time when large number of jobs
[ https://issues.apache.org/jira/browse/YARN-3700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3700: -- Fix Version/s: 2.7.2 Just pulled this into branch-2.7 (release 2.7.2) as it already exists in 2.6.1. branch-2 patch had merge conflicts. Ran compilation and TestApplicationHistoryClientService, TestApplicationHistoryManagerOnTimelineStore before the push. > ATS Web Performance issue at load time when large number of jobs > > > Key: YARN-3700 > URL: https://issues.apache.org/jira/browse/YARN-3700 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong > Labels: 2.6.1-candidate, 2.7.2-candidate > Fix For: 2.6.1, 2.8.0, 2.7.2 > > Attachments: YARN-3700-branch-2.6.1.txt, YARN-3700.1.patch, > YARN-3700.2.1.patch, YARN-3700.2.2.patch, YARN-3700.2.patch, > YARN-3700.3.patch, YARN-3700.4.patch > > > Currently, we will load all the apps when we try to load the yarn > timelineservice web page. If we have large number of jobs, it will be very > slow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739454#comment-14739454 ] Hudson commented on YARN-4106: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2316 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2316/]) YARN-4106. NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM. (Bibin A Chundatt via wangda) (wangda: rev 77666105b4557d5706e5844a4ca286917d966c5f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/CHANGES.txt > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Fix For: 2.8.0 > > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2890) MiniYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2890: -- Fix Version/s: 2.7.2 Just pulled this into branch-2.7 (release 2.7.2) as it already exists in 2.6.1. branch-2 patch applies cleanly. Ran compilation and TestJobHistoryEventHandler, TestMRTimelineEventHandling, TestDistributedShell, TestMiniYarnCluster before the push. > MiniYarnCluster should turn on timeline service if configured to do so > -- > > Key: YARN-2890 > URL: https://issues.apache.org/jira/browse/YARN-2890 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Labels: 2.6.1-candidate, 2.7.2-candidate > Fix For: 2.6.1, 2.8.0, 2.7.2 > > Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, > YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, > YARN-2890.patch, YARN-2890.patch > > > Currently the MiniMRYarnCluster does not consider the configuration value for > enabling timeline service before starting. The MiniYarnCluster should only > start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4141) Runtime Application Priority change should not throw exception for applications at finishing states
[ https://issues.apache.org/jira/browse/YARN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739428#comment-14739428 ] Hadoop QA commented on YARN-4141: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 3s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 53s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 7s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 50s | The applied patch generated 1 new checkstyle issues (total was 33, now 34). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 54m 37s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 94m 31s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12755171/0002-YARN-4141.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7766610 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9078/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9078/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9078/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9078/console | This message was automatically generated. > Runtime Application Priority change should not throw exception for > applications at finishing states > --- > > Key: YARN-4141 > URL: https://issues.apache.org/jira/browse/YARN-4141 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4141.patch, 0002-YARN-4141.patch > > > As suggested by [~jlowe] in > [MAPREDUCE-5870-comment|https://issues.apache.org/jira/browse/MAPREDUCE-5870?focusedCommentId=14737035&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737035] > , its good that if YARN can suppress exceptions during change application > priority calls for applications at its finishing stages. > Currently it will be difficult for clients to handle this. This will be > similar to kill application behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739383#comment-14739383 ] Hudson commented on YARN-4106: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1106 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1106/]) YARN-4106. NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM. (Bibin A Chundatt via wangda) (wangda: rev 77666105b4557d5706e5844a4ca286917d966c5f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java * hadoop-yarn-project/CHANGES.txt > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Fix For: 2.8.0 > > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition
[ https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739368#comment-14739368 ] Sunil G commented on YARN-4140: --- Yes [~leftnoteasy] Thank you clarifying the same. This makes sense. > RM container allocation delayed incase of app submitted to Nodelabel partition > -- > > Key: YARN-4140 > URL: https://issues.apache.org/jira/browse/YARN-4140 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Trying to run application on Nodelabel partition I found that the > application execution time is delayed by 5 – 10 min for 500 containers . > Total 3 machines 2 machines were in same partition and app submitted to same. > After enabling debug was able to find the below > # From AM the container ask is for OFF-SWITCH > # RM allocating all containers to NODE_LOCAL as shown in logs below. > # So since I was having about 500 containers time taken was about – 6 minutes > to allocate 1st map after AM allocation. > # Tested with about 1K maps using PI job took 17 minutes to allocate next > container after AM allocation > Once 500 container allocation on NODE_LOCAL is done the next container > allocation is done on OFF_SWITCH > {code} > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > /default-rack, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: *, Relax > Locality: true, Node Label Expression: 3} > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > host-10-19-92-143, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > host-10-19-92-117, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > {code} > > {code} > 2015-09-09 14:35:45,467 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:45,831 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:46,469 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:46,832 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > {code} > {code} > dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1> > cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep > "root.b.b1" | wc -l > 500 > {code} > > (Consumes about 6 minutes) > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4142) add a way for an attempt to report an attempt failure
[ https://issues.apache.org/jira/browse/YARN-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739361#comment-14739361 ] Sunil G commented on YARN-4142: --- Hi [~steve_l] I have a doubt here. In NodeManager {{ContainerImpl}}, we set diagnostics and exitcode for few error cases. So here "application explicitly terminates an attempt" means AM kills by itself for some reasons, or AM container/attempt is killed by cli command. > add a way for an attempt to report an attempt failure > - > > Key: YARN-4142 > URL: https://issues.apache.org/jira/browse/YARN-4142 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Affects Versions: 2.8.0 >Reporter: Steve Loughran > > Currently AMs can report a failure with exit code and diagnostics text —but > only when exiting to a failed state. If the AM terminates for any other > reason there's no information held in the RM, just the logs somewhere —and we > know they don't always last. > When an application explicitly terminates an attempt, it would be nice if it > could optionally report something to the RM before it exited. The most > recent set of these could then be included in Application Reports, so > allowing client apps to count attempt failures and get exit details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition
[ https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739349#comment-14739349 ] Wangda Tan commented on YARN-4140: -- We force client doesn't set node-label expression (in YARN-2694) because we don't want client set different node-label-expression for different resourceName in a same priority (for priority=2, "rack-1"'s node-label-expression="x", but "*"'s node-label-expression="y"). Remember we count pendingResource by using "*" of each priority. But we can normalize node-label-expression once they sent to scheduler. Make sense? > RM container allocation delayed incase of app submitted to Nodelabel partition > -- > > Key: YARN-4140 > URL: https://issues.apache.org/jira/browse/YARN-4140 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Trying to run application on Nodelabel partition I found that the > application execution time is delayed by 5 – 10 min for 500 containers . > Total 3 machines 2 machines were in same partition and app submitted to same. > After enabling debug was able to find the below > # From AM the container ask is for OFF-SWITCH > # RM allocating all containers to NODE_LOCAL as shown in logs below. > # So since I was having about 500 containers time taken was about – 6 minutes > to allocate 1st map after AM allocation. > # Tested with about 1K maps using PI job took 17 minutes to allocate next > container after AM allocation > Once 500 container allocation on NODE_LOCAL is done the next container > allocation is done on OFF_SWITCH > {code} > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > /default-rack, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: *, Relax > Locality: true, Node Label Expression: 3} > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > host-10-19-92-143, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > host-10-19-92-117, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > {code} > > {code} > 2015-09-09 14:35:45,467 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:45,831 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:46,469 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:46,832 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > {code} > {code} > dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1> > cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep > "root.b.b1" | wc -l > 500 > {code} > > (Consumes about 6 minutes) > -- This message was s
[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739309#comment-14739309 ] Hudson commented on YARN-4106: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #374 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/374/]) YARN-4106. NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM. (Bibin A Chundatt via wangda) (wangda: rev 77666105b4557d5706e5844a4ca286917d966c5f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Fix For: 2.8.0 > > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4142) add a way for an attempt to report an attempt failure
Steve Loughran created YARN-4142: Summary: add a way for an attempt to report an attempt failure Key: YARN-4142 URL: https://issues.apache.org/jira/browse/YARN-4142 Project: Hadoop YARN Issue Type: Sub-task Components: api Affects Versions: 2.8.0 Reporter: Steve Loughran Currently AMs can report a failure with exit code and diagnostics text —but only when exiting to a failed state. If the AM terminates for any other reason there's no information held in the RM, just the logs somewhere —and we know they don't always last. When an application explicitly terminates an attempt, it would be nice if it could optionally report something to the RM before it exited. The most recent set of these could then be included in Application Reports, so allowing client apps to count attempt failures and get exit details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739290#comment-14739290 ] Hudson commented on YARN-4106: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #368 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/368/]) YARN-4106. NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM. (Bibin A Chundatt via wangda) (wangda: rev 77666105b4557d5706e5844a4ca286917d966c5f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Fix For: 2.8.0 > > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4140) RM container allocation delayed incase of app submitted to Nodelabel partition
[ https://issues.apache.org/jira/browse/YARN-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739282#comment-14739282 ] Sunil G commented on YARN-4140: --- HI [~leftnoteasy] I have a doubt here. node-label expression is set in ANY by AM. Any reason why its not updated for node-local and rack-local there itself. Could you pls help to clarify. > RM container allocation delayed incase of app submitted to Nodelabel partition > -- > > Key: YARN-4140 > URL: https://issues.apache.org/jira/browse/YARN-4140 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > Trying to run application on Nodelabel partition I found that the > application execution time is delayed by 5 – 10 min for 500 containers . > Total 3 machines 2 machines were in same partition and app submitted to same. > After enabling debug was able to find the below > # From AM the container ask is for OFF-SWITCH > # RM allocating all containers to NODE_LOCAL as shown in logs below. > # So since I was having about 500 containers time taken was about – 6 minutes > to allocate 1st map after AM allocation. > # Tested with about 1K maps using PI job took 17 minutes to allocate next > container after AM allocation > Once 500 container allocation on NODE_LOCAL is done the next container > allocation is done on OFF_SWITCH > {code} > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > /default-rack, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: *, Relax > Locality: true, Node Label Expression: 3} > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > host-10-19-92-143, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: > showRequests: application=application_1441791998224_0001 request={Priority: > 20, Capability: , # Containers: 500, Location: > host-10-19-92-117, Relax Locality: true, Node Label Expression: } > 2015-09-09 15:21:58,954 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > {code} > > {code} > 2015-09-09 14:35:45,467 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:45,831 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:46,469 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > 2015-09-09 14:35:46,832 DEBUG > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: > Assigned to queue: root.b.b1 stats: b1: capacity=1.0, absoluteCapacity=0.5, > usedResources=, usedCapacity=0.0, > absoluteUsedCapacity=0.0, numApps=1, numContainers=1 --> vCores:0>, NODE_LOCAL > {code} > {code} > dsperf@host-127:/opt/bibin/dsperf/HAINSTALL/install/hadoop/resourcemanager/logs1> > cat hadoop-dsperf-resourcemanager-host-127.log | grep "NODE_LOCAL" | grep > "root.b.b1" | wc -l > 500 > {code} > > (Consumes about 6 minutes) > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4102) Add a "skip existing table" mode for timeline schema creator
[ https://issues.apache.org/jira/browse/YARN-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739270#comment-14739270 ] Sangjin Lee commented on YARN-4102: --- Hi [~gtCarrera9], it looks good to me too. Do you mind fixing that one little checkstyle issue, though? Then I think we can commit this. > Add a "skip existing table" mode for timeline schema creator > > > Key: YARN-4102 > URL: https://issues.apache.org/jira/browse/YARN-4102 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4102-YARN-2928.001.patch, > YARN-4102-YARN-2928.002.patch, YARN-4102-YARN-2928.003.patch > > > When debugging timeline POCs, we may need to create hbase tables that are > added in some ongoing patches. Right now, our schema creator will exit when > it hits one existing table. While this is a correct behavior with end users, > this introduces much trouble in debugging POCs: every time we have to disable > all existing tables, drop them, run the schema creator to generate all > tables, and regenerate all test data. > Maybe we'd like to add an "incremental" mode so that the creator will only > create non-existing tables? This is pretty handy in deploying our POCs. Of > course, consistency has to be kept in mind across tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3717) Expose app/am/queue's node-label-expression to RM web UI / CLI / REST-API
[ https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739204#comment-14739204 ] Naganarasimha G R commented on YARN-3717: - Thanks for the feedback [~leftnoteasy], bq. We can improve this in later patches. Ok will set in CLI and Webui as "" and REST will have null bq. This is more important to me, now we cannot do this through REST API, which will block effort of YARN-3368 to support showing labels metrics as well. Will immediately start working on this after I finish this jira. > Expose app/am/queue's node-label-expression to RM web UI / CLI / REST-API > - > > Key: YARN-3717 > URL: https://issues.apache.org/jira/browse/YARN-3717 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, > YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, > YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch > > > 1> Add the default-node-Label expression for each queue in scheduler page. > 2> In Application/Appattempt page show the app configured node label > expression for AM and Job -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739172#comment-14739172 ] Hudson commented on YARN-4106: -- FAILURE: Integrated in Hadoop-trunk-Commit #8430 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8430/]) YARN-4106. NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM. (Bibin A Chundatt via wangda) (wangda: rev 77666105b4557d5706e5844a4ca286917d966c5f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/ConfigurationNodeLabelsProvider.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/AbstractNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/nodelabels/TestConfigurationNodeLabelsProvider.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Fix For: 2.8.0 > > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity
[ https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739121#comment-14739121 ] Sunil G commented on YARN-4091: --- Thank you [~leftnoteasy] for sharing the thoughts. Yes. the REST framework looks fine. But after the first response update as "pending fetching", a second REST query has to be done to see the real result. Or we can dump this information as logs. I feel getting information back as REST o/p is more better and we utilize this framework in new UI. Hence timing of the second REST query is important as the intended node heartbeat has to happen (or by the time query comes, more heartbeats from same node would have come). Showing an aggregate debug information till second query is good, but I fear about the load on RM and the data produced. With a timelimit (or min count of number of heartbeats to debug) can help in this case. Thoughts? > Improvement: Introduce more debug/diagnostics information to detail out > scheduler activity > -- > > Key: YARN-4091 > URL: https://issues.apache.org/jira/browse/YARN-4091 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G > Attachments: Improvement on debugdiagnostic information - YARN.pdf > > > As schedulers are improved with various new capabilities, more configurations > which tunes the schedulers starts to take actions such as limit assigning > containers to an application, or introduce delay to allocate container etc. > There are no clear information passed down from scheduler to outerworld under > these various scenarios. This makes debugging very tougher. > This ticket is an effort to introduce more defined states on various parts in > scheduler where it skips/rejects container assignment, activate application > etc. Such information will help user to know whats happening in scheduler. > Attaching a short proposal for initial discussion. We would like to improve > on this as we discuss. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739113#comment-14739113 ] Wangda Tan commented on YARN-4106: -- bq. ... may be Labels Manager can support additional method which adds the missing labels first and then updates the mapping Doing this could be hard to manage: for example, how to deal with node label removal, you can do that when reference count of a label becomes zero, but resource request could be rejected if we remove a existing label. I would not prefer to add more flexibility to node partition, since it will likely break something. > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Fix For: 2.8.0 > > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4141) Runtime Application Priority change should not throw exception for applications at finishing states
[ https://issues.apache.org/jira/browse/YARN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4141: -- Attachment: 0002-YARN-4141.patch Meantime, attaching a new patch by addressing point 1. We will wait for input for point 2. > Runtime Application Priority change should not throw exception for > applications at finishing states > --- > > Key: YARN-4141 > URL: https://issues.apache.org/jira/browse/YARN-4141 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4141.patch, 0002-YARN-4141.patch > > > As suggested by [~jlowe] in > [MAPREDUCE-5870-comment|https://issues.apache.org/jira/browse/MAPREDUCE-5870?focusedCommentId=14737035&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737035] > , its good that if YARN can suppress exceptions during change application > priority calls for applications at its finishing stages. > Currently it will be difficult for clients to handle this. This will be > similar to kill application behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739097#comment-14739097 ] Naganarasimha G R commented on YARN-4106: - bq. Without a centralized node partition collection, capacity planning will be not straightforward. yes idea is to still have this collection but only difference being if the labels sent from NM is not present @ RM, then may be Labels Manager can support additional method which adds the missing labels first and then updates the mapping. thoughts? Yes it makes more sense when we support node constraint, but if we want to support more flexibility then we can think of supporting this too. > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Fix For: 2.8.0 > > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739078#comment-14739078 ] Wangda Tan commented on YARN-4106: -- [~Naganarasimha], bq. Wangda Tan As far as this patch its fine but was wondering to increase usability do we need to support YARN-2728, Support for disabling the Centralized NodeLabel validation in Distributed Node Label Configuration setup ? Since we only support node partition, and node partition relates to capacity planning, etc. Without a centralized node partition collection, capacity planning will be not straightforward. I think YARN-2728 makes more sense when we support node constraint. > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Fix For: 2.8.0 > > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4141) Runtime Application Priority change should not throw exception for applications at finishing states
[ https://issues.apache.org/jira/browse/YARN-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739074#comment-14739074 ] Sunil G commented on YARN-4141: --- HI [~rohithsharma] Thank you for the comments. I have one input for second comment. As we are not updating priority here, its not success, correct?. Hence I put as failure. How do u feel? > Runtime Application Priority change should not throw exception for > applications at finishing states > --- > > Key: YARN-4141 > URL: https://issues.apache.org/jira/browse/YARN-4141 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4141.patch > > > As suggested by [~jlowe] in > [MAPREDUCE-5870-comment|https://issues.apache.org/jira/browse/MAPREDUCE-5870?focusedCommentId=14737035&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14737035] > , its good that if YARN can suppress exceptions during change application > priority calls for applications at its finishing stages. > Currently it will be difficult for clients to handle this. This will be > similar to kill application behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739064#comment-14739064 ] Naganarasimha G R commented on YARN-4106: - +1, Applied the latest patch, ran the test cases and applied YARN-2729 on top of this patch and script was also running successfully. Latest patch LGTM. [~leftnoteasy] As far as this patch its fine but was wondering to increase usability do we need to support YARN-2728, ??Support for disabling the Centralized NodeLabel validation in Distributed Node Label Configuration setup?? ? > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4106) NodeLabels for NM in distributed mode is not updated even after clusterNodelabel addition in RM
[ https://issues.apache.org/jira/browse/YARN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739052#comment-14739052 ] Wangda Tan commented on YARN-4106: -- +1 to latest patch. > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > --- > > Key: YARN-4106 > URL: https://issues.apache.org/jira/browse/YARN-4106 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4106.patch, 0002-YARN-4106.patch, > 0003-YARN-4106.patch, 0004-YARN-4106.patch, 0005-YARN-4106.patch, > 0006-YARN-4106.patch, 0007-YARN-4106.patch, 0008-YARN-4106.patch > > > NodeLabels for NM in distributed mode is not updated even after > clusterNodelabel addition in RM > Steps to reproduce > === > # Configure nodelabel in distributed mode > yarn.node-labels.configuration-type=distributed > provider = config > yarn.nodemanager.node-labels.provider.fetch-interval-ms=12ms > # Start RM the NM > # Once NM is registration is done add nodelabels in RM > Nodelabels not getting updated in RM side > *This jira also handles the below issue too* > Timer Task not getting triggered in Nodemanager for Label update in > nodemanager for distributed scheduling > Task is supposed to trigger every > {{yarn.nodemanager.node-labels.provider.fetch-interval-ms}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3717) Expose app/am/queue's node-label-expression to RM web UI / CLI / REST-API
[ https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739046#comment-14739046 ] Wangda Tan commented on YARN-3717: -- [~Naganarasimha], bq. Well for a naive user will atleast know to what to look at. Or how about your idea of (For example, showing queue's label when the app's label doesn't set, etc.) I think we should do this -- "showing queue's label when the app's label doesn't set..", but I think this may need some effort, I haven't thought about it, it may need some changes in the scheduler side so RMApp can get label of queue. We can improve this in later patches. bq. Ok, Will raise and start working on them. please inform if the priorty is more so that can finish them faster. This is more important to me, now we cannot do this through REST API, which will block effort of YARN-3368 to support showing labels metrics as well. Thanks, > Expose app/am/queue's node-label-expression to RM web UI / CLI / REST-API > - > > Key: YARN-3717 > URL: https://issues.apache.org/jira/browse/YARN-3717 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, > YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, > YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch > > > 1> Add the default-node-Label expression for each queue in scheduler page. > 2> In Application/Appattempt page show the app configured node label > expression for AM and Job -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3901) Populate flow run data in the flow_run & flow activity tables
[ https://issues.apache.org/jira/browse/YARN-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739036#comment-14739036 ] Vrushali C commented on YARN-3901: -- Hi [~gtCarrera9] The start and end times for a flow run can be evaluated if you have all the known start times and end times of applications in that flow run and min/max timestamp can be evaluated. Hence this can be determined from the flow run table. But in the flow activity table, the purpose is to note that a flow was "active" on that day, meaning an application in that flow either started, completed or was running on that day. So when Joep and I had reviewed my patch together we realized that calculating the min/max in the flow activity table wont work for apps that span day boundaries and so in his comment on Aug 29th, there is a note "No timestamp needed in FlowActivity table. Runs can start one day and end another. Probably start without, add later if needed." That meant we did not need the coprocessor to determine min or max in the flow activity table. Hence I removed it. HTH Vrushali > Populate flow run data in the flow_run & flow activity tables > - > > Key: YARN-3901 > URL: https://issues.apache.org/jira/browse/YARN-3901 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > Attachments: YARN-3901-YARN-2928.1.patch, > YARN-3901-YARN-2928.2.patch, YARN-3901-YARN-2928.3.patch, > YARN-3901-YARN-2928.4.patch, YARN-3901-YARN-2928.5.patch > > > As per the schema proposed in YARN-3815 in > https://issues.apache.org/jira/secure/attachment/12743391/hbase-schema-proposal-for-aggregation.pdf > filing jira to track creation and population of data in the flow run table. > Some points that are being considered: > - Stores per flow run information aggregated across applications, flow version > RM’s collector writes to on app creation and app completion > - Per App collector writes to it for metric updates at a slower frequency > than the metric updates to application table > primary key: cluster ! user ! flow ! flow run id > - Only the latest version of flow-level aggregated metrics will be kept, even > if the entity and application level keep a timeseries. > - The running_apps column will be incremented on app creation, and > decremented on app completion. > - For min_start_time the RM writer will simply write a value with the tag for > the applicationId. A coprocessor will return the min value of all written > values. - > - Upon flush and compactions, the min value between all the cells of this > column will be written to the cell without any tag (empty tag) and all the > other cells will be discarded. > - Ditto for the max_end_time, but then the max will be kept. > - Tags are represented as #type:value. The type can be not set (0), or can > indicate running (1) or complete (2). In those cases (for metrics) only > complete app metrics are collapsed on compaction. > - The m! values are aggregated (summed) upon read. Only when applications are > completed (indicated by tag type 2) can the values be collapsed. > - The application ids that have completed and been aggregated into the flow > numbers are retained in a separate column for historical tracking: we don’t > want to re-aggregate for those upon replay > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors
[ https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738973#comment-14738973 ] Jason Lowe commented on YARN-2410: -- +1 for the latest patch. Committing this. > Nodemanager ShuffleHandler can possible exhaust file descriptors > > > Key: YARN-2410 > URL: https://issues.apache.org/jira/browse/YARN-2410 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Nathan Roberts >Assignee: Kuhu Shukla > Attachments: YARN-2410-v1.patch, YARN-2410-v10.patch, > YARN-2410-v11.patch, YARN-2410-v2.patch, YARN-2410-v3.patch, > YARN-2410-v4.patch, YARN-2410-v5.patch, YARN-2410-v6.patch, > YARN-2410-v7.patch, YARN-2410-v8.patch, YARN-2410-v9.patch > > > The async nature of the shufflehandler can cause it to open a huge number of > file descriptors, when it runs out it crashes. > Scenario: > Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node. > Let's say all 6K reduces hit a node at about same time asking for their > outputs. Each reducer will ask for all 40 map outputs over a single socket in > a > single request (not necessarily all 40 at once, but with coalescing it is > likely to be a large number). > sendMapOutput() will open the file for random reading and then perform an > async transfer of the particular portion of this file(). This will > theoretically > happen 6000*40=24 times which will run the NM out of file descriptors and > cause it to crash. > The algorithm should be refactored a little to not open the fds until they're > actually needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId
[ https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738865#comment-14738865 ] Hadoop QA commented on YARN-4131: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 22m 19s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 8m 0s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 0s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 3m 44s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 13s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 7m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 107m 50s | Tests passed in hadoop-mapreduce-client-jobclient. | | {color:green}+1{color} | yarn tests | 0m 29s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 7m 1s | Tests passed in hadoop-yarn-client. | | {color:red}-1{color} | yarn tests | 2m 2s | Tests failed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 7m 38s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 54m 21s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 233m 37s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.logaggregation.TestAggregatedLogsBlock | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService | | | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher | | | hadoop.yarn.server.resourcemanager.TestApplicationACLs | | | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12754928/YARN-4131-v1.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7b5b2c5 | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-YARN-Build/9077/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9077/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/9077/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9077/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9077/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9077/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9077/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9077/console | This message was automatically generated. > Add API and CLI to kill container on given containerId > -- > > Key: YARN-4131 > URL: https://issues.apache.org/jira/browse/YARN-4131 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, > YARN-4131-v1.1.patch, YARN-4131-v1.2.patch, YARN-4131-v1.patch > > > Per YARN-3337, we need a handy tools to kill container in some scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.
[ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738830#comment-14738830 ] MENG DING commented on YARN-1651: - Hi, [~leftnoteasy] One comment I forgot to post is that we may want to synchronize the RMContainerImpl.getAllocatedResource() call? Because the container resource may be updated at any time, e.g: {code:title=RMContainerImpl.java} @Override public Resource getAllocatedResource() { -return container.getResource(); +try { + readLock.lock(); + return Resources.clone(container.getResource()); +} finally { + readLock.unlock(); +} } {code} > CapacityScheduler side changes to support increase/decrease container > resource. > --- > > Key: YARN-1651 > URL: https://issues.apache.org/jira/browse/YARN-1651 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-1651-1.YARN-1197.patch, > YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, > YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4068) Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority
[ https://issues.apache.org/jira/browse/YARN-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738821#comment-14738821 ] Sunil G commented on YARN-4068: --- Thank you very much [~Naganarasimha Garla]. > Support appUpdated event in TimelineV2 to publish details for movetoqueue, > change in priority > - > > Key: YARN-4068 > URL: https://issues.apache.org/jira/browse/YARN-4068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sunil G >Assignee: Sunil G > > YARN-4044 supports appUpdated event changes to TimelineV1. This jira is to > track and port appUpdated changes in V2 for > - movetoqueue > - updateAppPriority -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2410) Nodemanager ShuffleHandler can possible exhaust file descriptors
[ https://issues.apache.org/jira/browse/YARN-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738773#comment-14738773 ] Nathan Roberts commented on YARN-2410: -- Thanks for the additional code comments. +1 > Nodemanager ShuffleHandler can possible exhaust file descriptors > > > Key: YARN-2410 > URL: https://issues.apache.org/jira/browse/YARN-2410 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.5.0 >Reporter: Nathan Roberts >Assignee: Kuhu Shukla > Attachments: YARN-2410-v1.patch, YARN-2410-v10.patch, > YARN-2410-v11.patch, YARN-2410-v2.patch, YARN-2410-v3.patch, > YARN-2410-v4.patch, YARN-2410-v5.patch, YARN-2410-v6.patch, > YARN-2410-v7.patch, YARN-2410-v8.patch, YARN-2410-v9.patch > > > The async nature of the shufflehandler can cause it to open a huge number of > file descriptors, when it runs out it crashes. > Scenario: > Job with 6K reduces, slow start set to 0.95, about 40 map outputs per node. > Let's say all 6K reduces hit a node at about same time asking for their > outputs. Each reducer will ask for all 40 map outputs over a single socket in > a > single request (not necessarily all 40 at once, but with coalescing it is > likely to be a large number). > sendMapOutput() will open the file for random reading and then perform an > async transfer of the particular portion of this file(). This will > theoretically > happen 6000*40=24 times which will run the NM out of file descriptors and > cause it to crash. > The algorithm should be refactored a little to not open the fds until they're > actually needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738728#comment-14738728 ] Junping Du commented on YARN-313: - Thanks [~elgoiri] for updating the patch! Current patch LGTM in overall but just a few NITs: 1. After think again, we should mark new added API as Evolving instead of Stable, like: RefreshResourcesRequest, RefreshResourcesResponse. 2. Tests for PB implementation of RefreshResourcesRequest and RefreshResourcesResponse needed to be added to TestPBImplRecords.java like other protocol records. 3. Fix checkstyle issues reported by Jenkins (ignore the first one as we can do nothing on this). > Add Admin API for supporting node resource configuration in command line > > > Key: YARN-313 > URL: https://issues.apache.org/jira/browse/YARN-313 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client >Reporter: Junping Du >Assignee: Inigo Goiri >Priority: Critical > Attachments: YARN-313-sample.patch, YARN-313-v1.patch, > YARN-313-v2.patch, YARN-313-v3.patch, YARN-313-v4.patch, YARN-313-v5.patch, > YARN-313-v6.patch, YARN-313-v7.patch, YARN-313-v8.patch, YARN-313-v9.patch > > > We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" > to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4134) FairScheduler preemption stops at queue level that all child queues are not over their fairshare
[ https://issues.apache.org/jira/browse/YARN-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianyin Xin updated YARN-4134: -- Attachment: YARN-4134.003.patch A tiny fix. > FairScheduler preemption stops at queue level that all child queues are not > over their fairshare > > > Key: YARN-4134 > URL: https://issues.apache.org/jira/browse/YARN-4134 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: Xianyin Xin > Attachments: YARN-4134.001.patch, YARN-4134.002.patch, > YARN-4134.003.patch > > > Now FairScheudler uses a choose-a-candidate method to select a container from > leaf queues that to be preempted, in {{FSParentQueue.preemptContainer()}}, > {code} > readLock.lock(); > try { > for (FSQueue queue : childQueues) { > if (candidateQueue == null || > comparator.compare(queue, candidateQueue) > 0) { > candidateQueue = queue; > } > } > } finally { > readLock.unlock(); > } > // Let the selected queue choose which of its container to preempt > if (candidateQueue != null) { > toBePreempted = candidateQueue.preemptContainer(); > } > {code} > a candidate child queue is selected. However, if the queue's usage isn't over > it's fairshare, preemption will not happen: > {code} > if (!preemptContainerPreCheck()) { > return toBePreempted; > } > {code} > A scenario: > {code} > root >/\ > queue1 queue2 >/\ > queue2.3, ( queue2.4 ) > {code} > suppose there're 8 containers, and queues at any level have the same weight. > queue1 takes 4 and queue2.3 takes 4, so both queue1 and queue2 are at their > fairshare. Now we submit an app in queue2.4 with 4 containers needs, it > should preempt 2 from queue2.3, but the candidate-containers selection > procedure will stop at queue1, so none of the containers will be preempted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2609) Example of use for the ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738670#comment-14738670 ] Bibin A Chundatt commented on YARN-2609: Minor comment from my side. If parameters not passed. {code} java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.examples.ReservationClientDemo.run(ReservationClientDemo.java:95) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) {code} # Parameter check can be done # Usage of example class too will be good to add > Example of use for the ReservationSystem > > > Key: YARN-2609 > URL: https://issues.apache.org/jira/browse/YARN-2609 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Carlo Curino >Assignee: Carlo Curino >Priority: Minor > Attachments: YARN-2609.docx, YARN-2609.patch > > > This JIRA provides a simple new example in mapreduce-examples that request a > reservation and submit a Pi computation in the reservation. This is meant > just to show how to interact with the reservation system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4134) FairScheduler preemption stops at queue level that all child queues are not over their fairshare
[ https://issues.apache.org/jira/browse/YARN-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianyin Xin updated YARN-4134: -- Attachment: YARN-4134.002.patch Remove testing remnant. > FairScheduler preemption stops at queue level that all child queues are not > over their fairshare > > > Key: YARN-4134 > URL: https://issues.apache.org/jira/browse/YARN-4134 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: Xianyin Xin > Attachments: YARN-4134.001.patch, YARN-4134.002.patch > > > Now FairScheudler uses a choose-a-candidate method to select a container from > leaf queues that to be preempted, in {{FSParentQueue.preemptContainer()}}, > {code} > readLock.lock(); > try { > for (FSQueue queue : childQueues) { > if (candidateQueue == null || > comparator.compare(queue, candidateQueue) > 0) { > candidateQueue = queue; > } > } > } finally { > readLock.unlock(); > } > // Let the selected queue choose which of its container to preempt > if (candidateQueue != null) { > toBePreempted = candidateQueue.preemptContainer(); > } > {code} > a candidate child queue is selected. However, if the queue's usage isn't over > it's fairshare, preemption will not happen: > {code} > if (!preemptContainerPreCheck()) { > return toBePreempted; > } > {code} > A scenario: > {code} > root >/\ > queue1 queue2 >/\ > queue2.3, ( queue2.4 ) > {code} > suppose there're 8 containers, and queues at any level have the same weight. > queue1 takes 4 and queue2.3 takes 4, so both queue1 and queue2 are at their > fairshare. Now we submit an app in queue2.4 with 4 containers needs, it > should preempt 2 from queue2.3, but the candidate-containers selection > procedure will stop at queue1, so none of the containers will be preempted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4134) FairScheduler preemption stops at queue level that all child queues are not over their fairshare
[ https://issues.apache.org/jira/browse/YARN-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianyin Xin updated YARN-4134: -- Attachment: YARN-4134.001.patch Upload a patch for preview. > FairScheduler preemption stops at queue level that all child queues are not > over their fairshare > > > Key: YARN-4134 > URL: https://issues.apache.org/jira/browse/YARN-4134 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: Xianyin Xin > Attachments: YARN-4134.001.patch > > > Now FairScheudler uses a choose-a-candidate method to select a container from > leaf queues that to be preempted, in {{FSParentQueue.preemptContainer()}}, > {code} > readLock.lock(); > try { > for (FSQueue queue : childQueues) { > if (candidateQueue == null || > comparator.compare(queue, candidateQueue) > 0) { > candidateQueue = queue; > } > } > } finally { > readLock.unlock(); > } > // Let the selected queue choose which of its container to preempt > if (candidateQueue != null) { > toBePreempted = candidateQueue.preemptContainer(); > } > {code} > a candidate child queue is selected. However, if the queue's usage isn't over > it's fairshare, preemption will not happen: > {code} > if (!preemptContainerPreCheck()) { > return toBePreempted; > } > {code} > A scenario: > {code} > root >/\ > queue1 queue2 >/\ > queue2.3, ( queue2.4 ) > {code} > suppose there're 8 containers, and queues at any level have the same weight. > queue1 takes 4 and queue2.3 takes 4, so both queue1 and queue2 are at their > fairshare. Now we submit an app in queue2.4 with 4 containers needs, it > should preempt 2 from queue2.3, but the candidate-containers selection > procedure will stop at queue1, so none of the containers will be preempted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738637#comment-14738637 ] Jian He edited comment on YARN-4126 at 9/10/15 12:15 PM: - [~bibinchundatt], thanks for working on this ! calling initializeUserGroupSecureMode everywhere in all related test cases does not seem like an elegant solution. why is this call needed? Could you do it in a more clean way? was (Author: jianhe): [~bibindeve...@gmail.com], thanks for working on this ! calling initializeUserGroupSecureMode everywhere in all related test cases does not seem like an elegant solution. why is this call needed? Could you do it in a more clean way? > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, > 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4126) RM should not issue delegation tokens in unsecure mode
[ https://issues.apache.org/jira/browse/YARN-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738637#comment-14738637 ] Jian He commented on YARN-4126: --- [~bibindeve...@gmail.com], thanks for working on this ! calling initializeUserGroupSecureMode everywhere in all related test cases does not seem like an elegant solution. why is this call needed? Could you do it in a more clean way? > RM should not issue delegation tokens in unsecure mode > -- > > Key: YARN-4126 > URL: https://issues.apache.org/jira/browse/YARN-4126 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4126.patch, 0002-YARN-4126.patch, > 0003-YARN-4126.patch, 0004-YARN-4126.patch, 0005-YARN-4126.patch > > > ClientRMService#getDelegationToken is currently returning a delegation token > in insecure mode. We should not return the token if it's in insecure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class
[ https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738611#comment-14738611 ] Hadoop QA commented on YARN-4081: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 18s | Pre-patch YARN-3926 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 8m 5s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 12s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 32s | The applied patch generated 1 new checkstyle issues (total was 10, now 3). | | {color:green}+1{color} | whitespace | 0m 7s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 13s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 2s | Tests passed in hadoop-yarn-common. | | | | 46m 38s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12755106/YARN-4081-YARN-3926.008.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-3926 / 1dbd8e3 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9076/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9076/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9076/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9076/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9076/console | This message was automatically generated. > Add support for multiple resource types in the Resource class > - > > Key: YARN-4081 > URL: https://issues.apache.org/jira/browse/YARN-4081 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4081-YARN-3926.001.patch, > YARN-4081-YARN-3926.002.patch, YARN-4081-YARN-3926.003.patch, > YARN-4081-YARN-3926.004.patch, YARN-4081-YARN-3926.005.patch, > YARN-4081-YARN-3926.006.patch, YARN-4081-YARN-3926.007.patch, > YARN-4081-YARN-3926.008.patch > > > For adding support for multiple resource types, we need to add support for > this in the Resource class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId
[ https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738588#comment-14738588 ] Junping Du commented on YARN-4131: -- I think we need a coordination between YARN-445 and YARN-4131 work. May be an offline call meeting could be more feasible. Will send invitation to related people. > Add API and CLI to kill container on given containerId > -- > > Key: YARN-4131 > URL: https://issues.apache.org/jira/browse/YARN-4131 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, > YARN-4131-v1.1.patch, YARN-4131-v1.2.patch, YARN-4131-v1.patch > > > Per YARN-3337, we need a handy tools to kill container in some scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.
[ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738568#comment-14738568 ] Jian He edited comment on YARN-1651 at 9/10/15 10:45 AM: - bq. I think we may need add such information to AMRMProtocol to make sure AM will be notified. For now, we can keep them as-is. Users can still get such information from RM logs. I think for now we can fail the allocate call explicitly on those very clear situations in checkAndNormalizeContainerChangeRequest ?, e.g. the situation that rmContainer doesn't exist That's more explicit to users. Digging through logs is not an easy thing for application writer. thanks for updating, Wangda ! some more comments focusing on decreasing code path. - this may be not correct, because reserve event can happen on RESERVE state too, i.e. reReservation {code} if (container.getState() != RMContainerState.NEW) { container.hasIncreaseReservation = true; } {code} - RMNodeImpl#toBeDecreasedContainers - no need to be a map, it can be a list ? and therefore NodeHeartBeatResponse and Impl change is not needed; similarly nmReportedIncreasedContainers can be a list. - When decreasing a container, should it send RMNodeDecreaseContainerEvent too ? - revert ContainerManagerImpl change - Remove SchedulerApplicationAttempt#getIncreaseRequests - In AbstractYarnScheduler#deceraseContainers() move checkAndNormalizeContainerChangeRequests(decreaseRequests, false) to the same place as checkAndNormalizeContainerChangeRequests(increaseRequests, false) for consistency. - this if condition is not needed. {code} public boolean unreserve(Priority priority, FiCaSchedulerNode node, RMContainer rmContainer) { if (rmContainer.hasIncreaseReservation()) { rmContainer.cancelIncreaseReservation(); } {code} - looks like when decreasing reservedIncreasedContainer, it will unreserve the *whole* extra reserved resource, should it only unreserve the extra resources being decresed ? - In general, I think we should be able to decrease/increase a regular reserved container or a increasedReservedContainer ? - In ParentQueue, this null check is not needed. {code} @Override public void decreaseContainer(Resource clusterResource, SchedContainerChangeRequest decreaseRequest, FiCaSchedulerApp app) { if (app != null) { {code} - allocate call is specifically marked as noLock, but now every allocate call holds the global scheduler lock which is too expensive. we can move decreaseContainer to application itself. {code} protected synchronized void decreaseContainer( {code} It is also now holding queue Lock on allocate, which is also expensive, because that means a bunch of AMs calling allocate very frequently can effectively block queue's execuation. was (Author: jianhe): bq. I think we may need add such information to AMRMProtocol to make sure AM will be notified. For now, we can keep them as-is. Users can still get such information from RM logs. I think for now we can fail the allocate call explicitly on those very clear situations in checkAndNormalizeContainerChangeRequest ?, e.g. the situation that rmContainer doesn't exist That's more explicit to users. Digging through logs is not an easy thing for application writer. thanks for updating, Wangda ! some more comments focusing on decreasing code path. - this may be not correct, because reserve event can happen on RESERVE state too, i.e. reReservation {code} if (container.getState() != RMContainerState.NEW) { container.hasIncreaseReservation = true; } {code} - RMNodeImpl#toBeDecreasedContainers - no need to be a map, it can be a list ? and therefore NodeHeartBeatResponse and Impl change is not needed; similarly nmReportedIncreasedContainers can be a list. - When decreasing a container, should it send RMNodeDecreaseContainerEvent too ? - revert ContainerManagerImpl change - Remove SchedulerApplicationAttempt#getIncreaseRequests - In AbstractYarnScheduler#deceraseContainers() move checkAndNormalizeContainerChangeRequests(decreaseRequests, false) to the same place as checkAndNormalizeContainerChangeRequests(increaseRequests, false) for consistency. - this if condition is not needed. {code} public boolean unreserve(Priority priority, FiCaSchedulerNode node, RMContainer rmContainer) { if (rmContainer.hasIncreaseReservation()) { rmContainer.cancelIncreaseReservation(); } {code} - looks like when decreasing reservedIncreasedContainer, it will unreserve the *whole* extra reserved resource, should it only unreserve the extra resources being decresed ? - In general, I think we should be able to decrease/increase a regular reserved container or a increasedReservedContainer ? - In ParentQueue, this null check is not needed. {code} @Override public void decreaseContain
[jira] [Comment Edited] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.
[ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738568#comment-14738568 ] Jian He edited comment on YARN-1651 at 9/10/15 10:46 AM: - bq. I think we may need add such information to AMRMProtocol to make sure AM will be notified. For now, we can keep them as-is. Users can still get such information from RM logs. I think for now we can fail the allocate call explicitly on those very clear situations in checkAndNormalizeContainerChangeRequest ?, e.g. the situation that rmContainer doesn't exist That's more explicit to users. Digging through logs is not an easy thing for application writer. thanks for updating, Wangda ! some more comments focusing on decreasing code path. - this may be not correct, because reserve event can happen on RESERVE state too, i.e. reReservation {code} if (container.getState() != RMContainerState.NEW) { container.hasIncreaseReservation = true; } {code} - RMNodeImpl#toBeDecreasedContainers - no need to be a map, it can be a list ? and therefore NodeHeartBeatResponse and Impl change is not needed; similarly nmReportedIncreasedContainers can be a list. - When decreasing a container, should it send RMNodeDecreaseContainerEvent too ? - revert ContainerManagerImpl change - Remove SchedulerApplicationAttempt#getIncreaseRequests - In AbstractYarnScheduler#deceraseContainers() move checkAndNormalizeContainerChangeRequests(decreaseRequests, false) to the same place as checkAndNormalizeContainerChangeRequests(increaseRequests, false) for consistency. - this if condition is not needed. {code} public boolean unreserve(Priority priority, FiCaSchedulerNode node, RMContainer rmContainer) { if (rmContainer.hasIncreaseReservation()) { rmContainer.cancelIncreaseReservation(); } {code} - looks like when decreasing reservedIncreasedContainer, it will unreserve the *whole* extra reserved resource, should it only unreserve the extra resources being decresed ? - In general, I think we should be able to decrease/increase a regular reserved container or a increasedReservedContainer ? - In ParentQueue, this null check is not needed. {code} @Override public void decreaseContainer(Resource clusterResource, SchedContainerChangeRequest decreaseRequest, FiCaSchedulerApp app) { if (app != null) { {code} - allocate call is specifically marked as noLock, but now every allocate call holds the global scheduler lock which is too expensive. we can move decreaseContainer to application itself. {code} protected synchronized void decreaseContainer( {code} It is also now holding queue Lock on allocate, which is also expensive, because that means a bunch of AMs calling allocate very frequently can effectively block the queues' execuation. was (Author: jianhe): bq. I think we may need add such information to AMRMProtocol to make sure AM will be notified. For now, we can keep them as-is. Users can still get such information from RM logs. I think for now we can fail the allocate call explicitly on those very clear situations in checkAndNormalizeContainerChangeRequest ?, e.g. the situation that rmContainer doesn't exist That's more explicit to users. Digging through logs is not an easy thing for application writer. thanks for updating, Wangda ! some more comments focusing on decreasing code path. - this may be not correct, because reserve event can happen on RESERVE state too, i.e. reReservation {code} if (container.getState() != RMContainerState.NEW) { container.hasIncreaseReservation = true; } {code} - RMNodeImpl#toBeDecreasedContainers - no need to be a map, it can be a list ? and therefore NodeHeartBeatResponse and Impl change is not needed; similarly nmReportedIncreasedContainers can be a list. - When decreasing a container, should it send RMNodeDecreaseContainerEvent too ? - revert ContainerManagerImpl change - Remove SchedulerApplicationAttempt#getIncreaseRequests - In AbstractYarnScheduler#deceraseContainers() move checkAndNormalizeContainerChangeRequests(decreaseRequests, false) to the same place as checkAndNormalizeContainerChangeRequests(increaseRequests, false) for consistency. - this if condition is not needed. {code} public boolean unreserve(Priority priority, FiCaSchedulerNode node, RMContainer rmContainer) { if (rmContainer.hasIncreaseReservation()) { rmContainer.cancelIncreaseReservation(); } {code} - looks like when decreasing reservedIncreasedContainer, it will unreserve the *whole* extra reserved resource, should it only unreserve the extra resources being decresed ? - In general, I think we should be able to decrease/increase a regular reserved container or a increasedReservedContainer ? - In ParentQueue, this null check is not needed. {code} @Override public void decreaseCo
[jira] [Commented] (YARN-1651) CapacityScheduler side changes to support increase/decrease container resource.
[ https://issues.apache.org/jira/browse/YARN-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738568#comment-14738568 ] Jian He commented on YARN-1651: --- bq. I think we may need add such information to AMRMProtocol to make sure AM will be notified. For now, we can keep them as-is. Users can still get such information from RM logs. I think for now we can fail the allocate call explicitly on those very clear situations in checkAndNormalizeContainerChangeRequest ?, e.g. the situation that rmContainer doesn't exist That's more explicit to users. Digging through logs is not an easy thing for application writer. thanks for updating, Wangda ! some more comments focusing on decreasing code path. - this may be not correct, because reserve event can happen on RESERVE state too, i.e. reReservation {code} if (container.getState() != RMContainerState.NEW) { container.hasIncreaseReservation = true; } {code} - RMNodeImpl#toBeDecreasedContainers - no need to be a map, it can be a list ? and therefore NodeHeartBeatResponse and Impl change is not needed; similarly nmReportedIncreasedContainers can be a list. - When decreasing a container, should it send RMNodeDecreaseContainerEvent too ? - revert ContainerManagerImpl change - Remove SchedulerApplicationAttempt#getIncreaseRequests - In AbstractYarnScheduler#deceraseContainers() move checkAndNormalizeContainerChangeRequests(decreaseRequests, false) to the same place as checkAndNormalizeContainerChangeRequests(increaseRequests, false) for consistency. - this if condition is not needed. {code} public boolean unreserve(Priority priority, FiCaSchedulerNode node, RMContainer rmContainer) { if (rmContainer.hasIncreaseReservation()) { rmContainer.cancelIncreaseReservation(); } {code} - looks like when decreasing reservedIncreasedContainer, it will unreserve the *whole* extra reserved resource, should it only unreserve the extra resources being decresed ? - In general, I think we should be able to decrease/increase a regular reserved container or a increasedReservedContainer ? - In ParentQueue, this null check is not needed. {code} @Override public void decreaseContainer(Resource clusterResource, SchedContainerChangeRequest decreaseRequest, FiCaSchedulerApp app) { if (app != null) { {code} - allocate call is specifically marked as noLock, but now every allocate call holds the global scheduler lock which is too expensive. we can move decreaseContainer to application itself. {code} protected synchronized void decreaseContainer( {code} It is also now holding queue Lock on allocate, which is also expensive, because that means a bunch of malicious AMs can effectively block queue's execuation. > CapacityScheduler side changes to support increase/decrease container > resource. > --- > > Key: YARN-1651 > URL: https://issues.apache.org/jira/browse/YARN-1651 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-1651-1.YARN-1197.patch, > YARN-1651-2.YARN-1197.patch, YARN-1651-3.YARN-1197.patch, > YARN-1651-4.YARN-1197.patch, YARN-1651-5.YARN-1197.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4081) Add support for multiple resource types in the Resource class
[ https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-4081: Attachment: YARN-4081-YARN-3926.008.patch Attaching a new version of the patch without the web services changes. [~leftnoteasy] had concerns that we don't have existing tests to make sure teh web services changes won't break existing APIs. This will lead to failing unit tests which will be addressed in later patches(once we add unit tests to validate we won't break the REST API). > Add support for multiple resource types in the Resource class > - > > Key: YARN-4081 > URL: https://issues.apache.org/jira/browse/YARN-4081 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4081-YARN-3926.001.patch, > YARN-4081-YARN-3926.002.patch, YARN-4081-YARN-3926.003.patch, > YARN-4081-YARN-3926.004.patch, YARN-4081-YARN-3926.005.patch, > YARN-4081-YARN-3926.006.patch, YARN-4081-YARN-3926.007.patch, > YARN-4081-YARN-3926.008.patch > > > For adding support for multiple resource types, we need to add support for > this in the Resource class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738559#comment-14738559 ] Varun Saxena commented on YARN-4075: Ok, will rebase the patch. Maybe after reviewing 3901 and 4074 > [reader REST API] implement support for querying for flows and flow runs > > > Key: YARN-4075 > URL: https://issues.apache.org/jira/browse/YARN-4075 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-4075-YARN-2928.POC.1.patch > > > We need to be able to query for flows and flow runs via REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4131) Add API and CLI to kill container on given containerId
[ https://issues.apache.org/jira/browse/YARN-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738377#comment-14738377 ] Steve Loughran commented on YARN-4131: -- For fault injection /chaos monkey I want containers killed without warning, so as to test how the app and its AM handle it. It should look exactly like any of the infrastructure failures: container exit, Yarn OOM event, pre-emption, node failure, ... signalling is meant to give the AM the opportunity to send events —like a clean shutdown signal— to apps > Add API and CLI to kill container on given containerId > -- > > Key: YARN-4131 > URL: https://issues.apache.org/jira/browse/YARN-4131 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-4131-demo-2.patch, YARN-4131-demo.patch, > YARN-4131-v1.1.patch, YARN-4131-v1.2.patch, YARN-4131-v1.patch > > > Per YARN-3337, we need a handy tools to kill container in some scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4068) Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority
[ https://issues.apache.org/jira/browse/YARN-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738364#comment-14738364 ] Naganarasimha G R commented on YARN-4068: - already linked YARN-4129 > Support appUpdated event in TimelineV2 to publish details for movetoqueue, > change in priority > - > > Key: YARN-4068 > URL: https://issues.apache.org/jira/browse/YARN-4068 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sunil G >Assignee: Sunil G > > YARN-4044 supports appUpdated event changes to TimelineV1. This jira is to > track and port appUpdated changes in V2 for > - movetoqueue > - updateAppPriority -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4111) Killed application diagnostics message should be set rather having static mesage
[ https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738348#comment-14738348 ] Hadoop QA commented on YARN-4111: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 39s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 56s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 52s | The applied patch generated 1 new checkstyle issues (total was 299, now 300). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 54m 19s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 93m 35s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12755071/YARN-4111_2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f153710 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9075/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9075/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9075/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9075/console | This message was automatically generated. > Killed application diagnostics message should be set rather having static > mesage > > > Key: YARN-4111 > URL: https://issues.apache.org/jira/browse/YARN-4111 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > Attachments: YARN-4111_1.patch, YARN-4111_2.patch > > > Application can be killed either by *user via ClientRMService* OR *from > scheduler*. Currently diagnostic message is set statically i.e {{Application > killed by user.}} neverthless of application killed by scheduler. This brings > the confusion to the user after application is Killed that he did not kill > application at all but diagnostic message depicts that 'application is killed > by user'. > It would be useful if the diagnostic message are different for each cause of > KILL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4136) LinuxContainerExecutor loses info when forwarding ResourceHandlerException
[ https://issues.apache.org/jira/browse/YARN-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738326#comment-14738326 ] Varun Vasudev commented on YARN-4136: - +1 for the patch. I'll commit this tomorrow if no one objects. > LinuxContainerExecutor loses info when forwarding ResourceHandlerException > -- > > Key: YARN-4136 > URL: https://issues.apache.org/jira/browse/YARN-4136 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.7.1 >Reporter: Steve Loughran >Assignee: Bibin A Chundatt >Priority: Trivial > Attachments: 0001-YARN-4136.patch > > > The Linux container executor {{launchContainer}} method throws > {{ResourceHandlerException}} when there are problems setting up the container > -but these aren't propagated in the raised IOE. They should be nested with > the string value included in the message text. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4111) Killed application diagnostics message should be set rather having static mesage
[ https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738293#comment-14738293 ] Sunil G commented on YARN-4111: --- Hi [~nijel] one minor nit: {{RMAppKilledAttemptEvent}} is used for both RMApp and RMAppAttempt. Name is slightly confusing. I think we can use this only for RMApp. Also in RMAppAttempt, {{RMAppFailedAttemptEvent}} is changed to {{RMAppKilledAttemptEvent}}. Could we generalize RMAppFailedAttemptEvent for both Failed and Killed, and it can also take diagnostics. > Killed application diagnostics message should be set rather having static > mesage > > > Key: YARN-4111 > URL: https://issues.apache.org/jira/browse/YARN-4111 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > Attachments: YARN-4111_1.patch, YARN-4111_2.patch > > > Application can be killed either by *user via ClientRMService* OR *from > scheduler*. Currently diagnostic message is set statically i.e {{Application > killed by user.}} neverthless of application killed by scheduler. This brings > the confusion to the user after application is Killed that he did not kill > application at all but diagnostic message depicts that 'application is killed > by user'. > It would be useful if the diagnostic message are different for each cause of > KILL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)