[jira] [Commented] (YARN-6827) [ATS1/1.5] NPE exception while publishing recovering applications into ATS during RM restart.
[ https://issues.apache.org/jira/browse/YARN-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16445384#comment-16445384 ] Rohith Sharma K S commented on YARN-6827: - Cherry-picked to branch-2 as well. thanks to [~sunilg] for review and committing the patch. > [ATS1/1.5] NPE exception while publishing recovering applications into ATS > during RM restart. > - > > Key: YARN-6827 > URL: https://issues.apache.org/jira/browse/YARN-6827 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.1.1, 3.0.3 > > Attachments: YARN-6827.01.patch > > > While recovering application, it is observed that NPE exception is thrown as > below. > {noformat} > 017-07-13 14:08:12,476 ERROR > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher: > Error when publishing entity > [YARN_APPLICATION,application_1499929227397_0001] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368) > {noformat} > This is because in RM service start, active services are started first in Non > HA case and later ATSv1 services are started. In HA case, tansitionToActive > event has come first before ATS service are started. > This gives sufficient time to active services recover the applications which > tries to publish into ATSv1 while recovering. Since ATS services are not > started yet, it throws NPE. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6827) [ATS1/1.5] NPE exception while publishing recovering applications into ATS during RM restart.
[ https://issues.apache.org/jira/browse/YARN-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444580#comment-16444580 ] Hudson commented on YARN-6827: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #14029 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14029/]) YARN-6827. [ATS1/1.5] NPE exception while publishing recovering (sunilg: rev 7d06806dfdeb3252ac0defe23e8c468eabfa8b5e) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java > [ATS1/1.5] NPE exception while publishing recovering applications into ATS > during RM restart. > - > > Key: YARN-6827 > URL: https://issues.apache.org/jira/browse/YARN-6827 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Major > Attachments: YARN-6827.01.patch > > > While recovering application, it is observed that NPE exception is thrown as > below. > {noformat} > 017-07-13 14:08:12,476 ERROR > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher: > Error when publishing entity > [YARN_APPLICATION,application_1499929227397_0001] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368) > {noformat} > This is because in RM service start, active services are started first in Non > HA case and later ATSv1 services are started. In HA case, tansitionToActive > event has come first before ATS service are started. > This gives sufficient time to active services recover the applications which > tries to publish into ATSv1 while recovering. Since ATS services are not > started yet, it throws NPE. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6827) [ATS1/1.5] NPE exception while publishing recovering applications into ATS during RM restart.
[ https://issues.apache.org/jira/browse/YARN-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442567#comment-16442567 ] Sunil G commented on YARN-6827: --- Path looks fine. If there are no objections, I will commit the patch tomorrow. Thank You. > [ATS1/1.5] NPE exception while publishing recovering applications into ATS > during RM restart. > - > > Key: YARN-6827 > URL: https://issues.apache.org/jira/browse/YARN-6827 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Major > Attachments: YARN-6827.01.patch > > > While recovering application, it is observed that NPE exception is thrown as > below. > {noformat} > 017-07-13 14:08:12,476 ERROR > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher: > Error when publishing entity > [YARN_APPLICATION,application_1499929227397_0001] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368) > {noformat} > This is because in RM service start, active services are started first in Non > HA case and later ATSv1 services are started. In HA case, tansitionToActive > event has come first before ATS service are started. > This gives sufficient time to active services recover the applications which > tries to publish into ATSv1 while recovering. Since ATS services are not > started yet, it throws NPE. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6827) [ATS1/1.5] NPE exception while publishing recovering applications into ATS during RM restart.
[ https://issues.apache.org/jira/browse/YARN-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442263#comment-16442263 ] genericqa commented on YARN-6827: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 30s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 66m 27s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}118m 5s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | YARN-6827 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12919580/YARN-6827.01.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux c83b0b21f9d2 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 034da8f | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20389/testReport/ | | Max. process+thread count | 815 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20389/console | | Powered by | Apache Yetus 0.8
[jira] [Commented] (YARN-6827) [ATS1/1.5] NPE exception while publishing recovering applications into ATS during RM restart.
[ https://issues.apache.org/jira/browse/YARN-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442137#comment-16442137 ] Rohith Sharma K S commented on YARN-6827: - Updated the patch that does transitioningToActive post RM service start only in Non HA deployment. I tested the patch in real cluster. [~sunilg] could you review the patch? > [ATS1/1.5] NPE exception while publishing recovering applications into ATS > during RM restart. > - > > Key: YARN-6827 > URL: https://issues.apache.org/jira/browse/YARN-6827 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S >Priority: Major > Attachments: YARN-6827.01.patch > > > While recovering application, it is observed that NPE exception is thrown as > below. > {noformat} > 017-07-13 14:08:12,476 ERROR > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher: > Error when publishing entity > [YARN_APPLICATION,application_1499929227397_0001] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368) > {noformat} > This is because in RM service start, active services are started first in Non > HA case and later ATSv1 services are started. In HA case, tansitionToActive > event has come first before ATS service are started. > This gives sufficient time to active services recover the applications which > tries to publish into ATSv1 while recovering. Since ATS services are not > started yet, it throws NPE. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6827) [ATS1/1.5] NPE exception while publishing recovering applications into ATS during RM restart.
[ https://issues.apache.org/jira/browse/YARN-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088468#comment-16088468 ] Rohith Sharma K S commented on YARN-6827: - Attaching the failure trace below. This shows that applications are recovered first before ATS services are started. {noformat} 2017-07-15 10:19:35,200 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to active state 2017-07-15 10:19:35,245 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Recovery started 2017-07-15 10:19:35,253 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Loaded RM state version info 1.4 2017-07-15 10:19:35,431 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Unknown child node with name: HIERARCHIES 2017-07-15 10:19:35,452 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: recovering RMDelegationTokenSecretManager. 2017-07-15 10:19:35,455 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Recovering 16 applications 2017-07-15 10:19:35,518 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Priority '0' is acceptable in queue : default for application: application_1499929227397_0001 2017-07-15 10:19:35,578 ERROR org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher: Error when publishing entity [YARN_APPLICATION,application_1499929227397_0001] java.lang.NullPointerException at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:368) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.appFinished(TimelineServiceV1Publisher.java:156) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$FinalTransition.transition(RMAppImpl.java:1472) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1073) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1062) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:887) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:383) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:590) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1372) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:749) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1131) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1171) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1167) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:317) at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:143) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:472) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:607) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505) {noformat} > [ATS1/1.5] NPE exception while publishing recovering applications into ATS > during RM restart. > --