[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3943: Attachment: (was: YARN-3943.001.patch) > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4155) TestLogAggregationService.testLogAggregationServiceWithInterval failing
[ https://issues.apache.org/jira/browse/YARN-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4155: --- Attachment: 0003-YARN-4155.patch Hi [~ste...@apache.org] Looked into the issue again seems timing issue. {{TestLogAggregationService#numOfLogsAvailable}} is checked immediately on {{aggregator.doLogAggregationOutOfBand()}} call . When logaggregation is in progress file name will be with extension *.tmp* {{TestLogAggregationService#numOfLogsAvailable}} {code} if (filename.contains(LogAggregationUtils.TMP_FILE_SUFFIX) || (lastLogFile != null && filename.contains(lastLogFile) && sizeLimited)) { LOG.info("fileName :" + filename); LOG.info("lastLogFile :" + lastLogFile); return -1; } {code} Returns -1 Attaching patch based on the same. > TestLogAggregationService.testLogAggregationServiceWithInterval failing > --- > > Key: YARN-4155 > URL: https://issues.apache.org/jira/browse/YARN-4155 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Steve Loughran >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-4155.patch, 0001-YARN-4155.patch, > 0003-YARN-4155.patch > > > Test failing on Jenkins: > {{TestLogAggregationService.testLogAggregationServiceWithInterval}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909141#comment-14909141 ] Hadoop QA commented on YARN-3943: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 42s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 17s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 8s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 54s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 25s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 1s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 8m 41s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 58m 39s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12762506/YARN-3943.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7fe521b | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9272/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9272/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9272/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9272/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9272/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9272/console | This message was automatically generated. > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch, YARN-3943.001.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909117#comment-14909117 ] Hadoop QA commented on YARN-3943: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 17s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 39s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 56s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 3s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 53s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 42s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 26s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 19s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 8m 53s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 60m 55s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12762496/YARN-3943.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 67b0e96 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9271/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9271/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9271/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9271/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9271/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9271/console | This message was automatically generated. > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch, YARN-3943.001.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4111) Killed application diagnostics message should be set rather having static mesage
[ https://issues.apache.org/jira/browse/YARN-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909120#comment-14909120 ] nijel commented on YARN-4111: - Thanks [~rohithsharma] and [~sunilg] for the comments If we add the new constructor to add the message, other event classes like RMAppRejectedEvent and RMAppFinishedAttemptEvent can be removed ? these are also added to handle the message. Or these classes can be kept as it is event separation for future updations. what you say ? > Killed application diagnostics message should be set rather having static > mesage > > > Key: YARN-4111 > URL: https://issues.apache.org/jira/browse/YARN-4111 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: nijel > Attachments: YARN-4111_1.patch, YARN-4111_2.patch, YARN-4111_3.patch, > YARN-4111_4.patch > > > Application can be killed either by *user via ClientRMService* OR *from > scheduler*. Currently diagnostic message is set statically i.e {{Application > killed by user.}} neverthless of application killed by scheduler. This brings > the confusion to the user after application is Killed that he did not kill > application at all but diagnostic message depicts that 'application is killed > by user'. > It would be useful if the diagnostic message are different for each cause of > KILL. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out
[ https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909121#comment-14909121 ] nijel commented on YARN-4205: - Test cases failing as "method not found" for the method added in api project. These tests passing locally ! I am not getting the reason for this fail. Any issue with build can cause this ? > Add a service for monitoring application life time out > -- > > Key: YARN-4205 > URL: https://issues.apache.org/jira/browse/YARN-4205 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: nijel >Assignee: nijel > Attachments: YARN-4205_01.patch, YARN-4205_02.patch > > > This JIRA intend to provide a lifetime monitor service. > The service will monitor the applications where the life time is configured. > If the application is running beyond the lifetime, it will be killed. > The lifetime will be considered from the submit time. > The thread monitoring interval is configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4155) TestLogAggregationService.testLogAggregationServiceWithInterval failing
[ https://issues.apache.org/jira/browse/YARN-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909291#comment-14909291 ] Hadoop QA commented on YARN-4155: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 6m 22s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 37s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 13s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 8m 28s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 26m 41s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12762522/0003-YARN-4155.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 7a3c381 | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9274/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9274/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9274/console | This message was automatically generated. > TestLogAggregationService.testLogAggregationServiceWithInterval failing > --- > > Key: YARN-4155 > URL: https://issues.apache.org/jira/browse/YARN-4155 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Steve Loughran >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-4155.patch, 0001-YARN-4155.patch, > 0003-YARN-4155.patch > > > Test failing on Jenkins: > {{TestLogAggregationService.testLogAggregationServiceWithInterval}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4075) [reader REST API] implement support for querying for flows and flow runs
[ https://issues.apache.org/jira/browse/YARN-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909354#comment-14909354 ] Varun Saxena commented on YARN-4075: Thanks Vrushali for review and commit. Thanks Li, Sangjin and Joep for reviews. > [reader REST API] implement support for querying for flows and flow runs > > > Key: YARN-4075 > URL: https://issues.apache.org/jira/browse/YARN-4075 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-4075-YARN-2928.03.patch, > YARN-4075-YARN-2928.04.patch, YARN-4075-YARN-2928.05.patch, > YARN-4075-YARN-2928.POC.1.patch, YARN-4075-YARN-2928.POC.2.patch > > > We need to be able to query for flows and flow runs via REST. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4155) TestLogAggregationService.testLogAggregationServiceWithInterval failing
[ https://issues.apache.org/jira/browse/YARN-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4155: --- Attachment: (was: 0003-YARN-4155.patch) > TestLogAggregationService.testLogAggregationServiceWithInterval failing > --- > > Key: YARN-4155 > URL: https://issues.apache.org/jira/browse/YARN-4155 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Steve Loughran >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-4155.patch, 0001-YARN-4155.patch, > 0003-YARN-4155.patch > > > Test failing on Jenkins: > {{TestLogAggregationService.testLogAggregationServiceWithInterval}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4155) TestLogAggregationService.testLogAggregationServiceWithInterval failing
[ https://issues.apache.org/jira/browse/YARN-4155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4155: --- Attachment: 0003-YARN-4155.patch > TestLogAggregationService.testLogAggregationServiceWithInterval failing > --- > > Key: YARN-4155 > URL: https://issues.apache.org/jira/browse/YARN-4155 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 > Environment: Jenkins >Reporter: Steve Loughran >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-4155.patch, 0001-YARN-4155.patch, > 0003-YARN-4155.patch > > > Test failing on Jenkins: > {{TestLogAggregationService.testLogAggregationServiceWithInterval}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3943) Use separate threshold configurations for disk-full detection and disk-not-full detection.
[ https://issues.apache.org/jira/browse/YARN-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3943: Attachment: YARN-3943.001.patch > Use separate threshold configurations for disk-full detection and > disk-not-full detection. > -- > > Key: YARN-3943 > URL: https://issues.apache.org/jira/browse/YARN-3943 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3943.000.patch, YARN-3943.001.patch > > > Use separate threshold configurations to check when disks become full and > when disks become good. Currently the configuration > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > and "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" are > used to check both when disks become full and when disks become good. It will > be better to use two configurations: one is used when disks become full from > not-full and the other one is used when disks become not-full from full. So > we can avoid oscillating frequently. > For example: we can set the one for disk-full detection higher than the one > for disk-not-full detection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4127) RM fail with noAuth error if switched from non-failover mode to failover mode
[ https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4127: --- Attachment: YARN-4127.01.patch > RM fail with noAuth error if switched from non-failover mode to failover mode > -- > > Key: YARN-4127 > URL: https://issues.apache.org/jira/browse/YARN-4127 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Varun Saxena > Attachments: YARN-4127.01.patch > > > The scenario is that RM failover was initially enabled, so the zkRootNodeAcl > is by default set with the *RM ID* in the ACL string > If RM failover is then switched to be disabled, it cannot load data from ZK > and fail with noAuth error. After I reset the root node ACL, it again can > access. > {code} > 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194) > {code} > the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to > connect with ZK and thus fail with no Auth error. > We should be able to switch failover on and off with no interruption to the > user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated YARN-3964: -- Attachment: YARN-3964.010.patch > Support NodeLabelsProvider at Resource Manager side > --- > > Key: YARN-3964 > URL: https://issues.apache.org/jira/browse/YARN-3964 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Dian Fu >Assignee: Dian Fu > Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, > YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, > YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, > YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, > YARN-3964.1.patch > > > Currently, CLI/REST API is provided in Resource Manager to allow users to > specify labels for nodes. For labels which may change over time, users will > have to start a cron job to update the labels. This has the following > limitations: > - The cron job needs to be run in the YARN admin user. > - This makes it a little complicate to maintain as users will have to make > sure this service/daemon is alive. > Adding a Node Labels Provider in Resource Manager will provide user more > flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated YARN-3964: -- Attachment: YARN-3964.011.patch > Support NodeLabelsProvider at Resource Manager side > --- > > Key: YARN-3964 > URL: https://issues.apache.org/jira/browse/YARN-3964 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Dian Fu >Assignee: Dian Fu > Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, > YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, > YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, > YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, > YARN-3964.011.patch, YARN-3964.1.patch > > > Currently, CLI/REST API is provided in Resource Manager to allow users to > specify labels for nodes. For labels which may change over time, users will > have to start a cron job to update the labels. This has the following > limitations: > - The cron job needs to be run in the YARN admin user. > - This makes it a little complicate to maintain as users will have to make > sure this service/daemon is alive. > Adding a Node Labels Provider in Resource Manager will provide user more > flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909535#comment-14909535 ] Dian Fu commented on YARN-3964: --- Updated the patch to fix the test failures and change the default interval time to 30 minutes. > Support NodeLabelsProvider at Resource Manager side > --- > > Key: YARN-3964 > URL: https://issues.apache.org/jira/browse/YARN-3964 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Dian Fu >Assignee: Dian Fu > Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, > YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, > YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, > YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, > YARN-3964.011.patch, YARN-3964.1.patch > > > Currently, CLI/REST API is provided in Resource Manager to allow users to > specify labels for nodes. For labels which may change over time, users will > have to start a cron job to update the labels. This has the following > limitations: > - The cron job needs to be run in the YARN admin user. > - This makes it a little complicate to maintain as users will have to make > sure this service/daemon is alive. > Adding a Node Labels Provider in Resource Manager will provide user more > flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4123) Unable to start YARN - Error starting JobHistoryServer
[ https://issues.apache.org/jira/browse/YARN-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909547#comment-14909547 ] Neelesh Srinivas Salian commented on YARN-4123: --- [~VINOTH.KANAKASABAPATHY], since you are running CDH, I believe https://community.cloudera.com/ will be the right avenue to help you move forward with your question, if you are still observing the behavior. Closing the JIRA here for now. Please re-open if applicable. Thank you. > Unable to start YARN - Error starting JobHistoryServer > -- > > Key: YARN-4123 > URL: https://issues.apache.org/jira/browse/YARN-4123 > Project: Hadoop YARN > Issue Type: Bug > Environment: Cloudera CDH 5.4.0 >Reporter: Vinoth Kanakasabapathy > > Hi, > I am having issues while restarting YARN service. It keeps failing with > errors shown in the logs below. YARN was working fine until last week and > then the below error messages started to pop up all of a sudden. > Tried restarting YARN/JOB history server. Nothing worked. Kindly help to > alleviate this issue. > Thanks, > Vinoth > org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer > Error starting JobHistoryServer > java.lang.IllegalAccessError: tried to access class > org.apache.hadoop.mapred.JobACLsManager from class > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:503) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:145) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:222) > at > org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:232) > 1:06:55.303 PMINFOorg.apache.hadoop.util.ExitUtil > Exiting with status -1 > 1:06:55.355 PMINFOorg.apache.hadoop.mapreduce.v2.hs.JobHistory > Stopping JobHistory > 1:06:55.358 PMINFO > org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer > SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down JobHistoryServer at master/10.144.25.49 > / -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from non-failover mode to failover mode
[ https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909608#comment-14909608 ] Varun Saxena commented on YARN-4127: All the test failures are unrelated. They are due to no class def found and hence must be because of parallel builds. > RM fail with noAuth error if switched from non-failover mode to failover mode > -- > > Key: YARN-4127 > URL: https://issues.apache.org/jira/browse/YARN-4127 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Varun Saxena > Attachments: YARN-4127.01.patch > > > The scenario is that RM failover was initially enabled, so the zkRootNodeAcl > is by default set with the *RM ID* in the ACL string > If RM failover is then switched to be disabled, it cannot load data from ZK > and fail with noAuth error. After I reset the root node ACL, it again can > access. > {code} > 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:973) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1010) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1010) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1050) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1194) > {code} > the problem may be that in non-failover mode, RM doesn't use the *RM-ID* to > connect with ZK and thus fail with no Auth error. > We should be able to switch failover on and off with no interruption to the > user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3964) Support NodeLabelsProvider at Resource Manager side
[ https://issues.apache.org/jira/browse/YARN-3964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909562#comment-14909562 ] Hadoop QA commented on YARN-3964: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 59s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 9m 4s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 28s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 5s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 52s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 39s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 27s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 14s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 63m 50s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 119m 35s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12762564/YARN-3964.011.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / bf37d3d | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9276/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9276/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9276/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9276/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9276/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9276/console | This message was automatically generated. > Support NodeLabelsProvider at Resource Manager side > --- > > Key: YARN-3964 > URL: https://issues.apache.org/jira/browse/YARN-3964 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Dian Fu >Assignee: Dian Fu > Attachments: YARN-3964 design doc.pdf, YARN-3964.002.patch, > YARN-3964.003.patch, YARN-3964.004.patch, YARN-3964.005.patch, > YARN-3964.006.patch, YARN-3964.007.patch, YARN-3964.007.patch, > YARN-3964.008.patch, YARN-3964.009.patch, YARN-3964.010.patch, > YARN-3964.011.patch, YARN-3964.1.patch > > > Currently, CLI/REST API is provided in Resource Manager to allow users to > specify labels for nodes. For labels which may change over time, users will > have to start a cron job to update the labels. This has the following > limitations: > - The cron job needs to be run in the YARN admin user. > - This makes it a little complicate to maintain as users will have to make > sure this service/daemon is alive. > Adding a Node Labels Provider in Resource Manager will provide user more > flexibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4127) RM fail with noAuth error if switched from non-failover mode to failover mode
[ https://issues.apache.org/jira/browse/YARN-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909425#comment-14909425 ] Hadoop QA commented on YARN-4127: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 14s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 9s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 22s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 51s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 38s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 59m 23s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 100m 20s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.TestApplicationMasterService | | | hadoop.yarn.server.resourcemanager.TestRMHAForNodeLabels | | Timed out tests | org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12762528/YARN-4127.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 861b52d | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9275/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9275/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9275/console | This message was automatically generated. > RM fail with noAuth error if switched from non-failover mode to failover mode > -- > > Key: YARN-4127 > URL: https://issues.apache.org/jira/browse/YARN-4127 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Varun Saxena > Attachments: YARN-4127.01.patch > > > The scenario is that RM failover was initially enabled, so the zkRootNodeAcl > is by default set with the *RM ID* in the ACL string > If RM failover is then switched to be disabled, it cannot load data from ZK > and fail with noAuth error. After I reset the root node ACL, it again can > access. > {code} > 15/09/08 14:28:34 ERROR resourcemanager.ResourceManager: Failed to > load/recover state > org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth > at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) > at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:949) > at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:915) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129) > at > org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125) > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) > at > org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1009) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeSetData(ZKRMStateStore.java:985) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.getAndIncrementEpoch(ZKRMStateStore.java:374) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:579) > at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at >