[jira] [Commented] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925358#comment-16925358 ] Rohith Sharma K S commented on YARN-9820: - +1 lgtm as well. > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9820-001.patch, YARN-9820-002.patch, > YARN-9820-003.patch > > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925354#comment-16925354 ] Hadoop QA commented on YARN-9821: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 3s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 47s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 49m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | YARN-9821 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12979810/YARN-9821-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5a91a9ee6a79 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3b9584d | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24773/testReport/ | | Max. process+thread count | 340 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbas
[jira] [Commented] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925351#comment-16925351 ] Jonathan Hung commented on YARN-9820: - Thanks [~Prabhu Joseph]. +1 pending jenkins. > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9820-001.patch, YARN-9820-002.patch, > YARN-9820-003.patch > > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925346#comment-16925346 ] Prabhu Joseph commented on YARN-9820: - Thanks [~jhung] and [~rohithsharma] for detailed review. Have used this approach in [^YARN-9820-003.patch] . > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9820-001.patch, YARN-9820-002.patch, > YARN-9820-003.patch > > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9820: Attachment: YARN-9820-003.patch > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9820-001.patch, YARN-9820-002.patch, > YARN-9820-003.patch > > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError
[ https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925341#comment-16925341 ] Prabhu Joseph commented on YARN-9816: - [~abmodi] Can you review this Jira when you get time. This ignores unexpected file in /ats/active directory causing EntityLogScanner thread to crash with StackOverflowError. Thanks. > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError > --- > > Key: YARN-9816 > URL: https://issues.apache.org/jira/browse/YARN-9816 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9816-001.patch > > > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError. > This happens when a file is present under /ats/active. > {code} > [hdfs@node2 yarn]$ hadoop fs -ls /ats/active > Found 1 items > -rw-r--r-- 3 hdfs hadoop 0 2019-09-06 16:34 > /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0 > {code} > Error Message: > {code:java} > java.lang.StackOverflowError > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) > at com.sun.proxy.$Proxy15.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > {code}
[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925332#comment-16925332 ] Prabhu Joseph commented on YARN-9821: - Thanks [~rohithsharma] and [~abmodi] for reviewing. Have fixed the review comments in [^YARN-9821-002.patch] . > NM hangs at serviceStop when ATSV2 Backend Hbase is Down > - > > Key: YARN-9821 > URL: https://issues.apache.org/jira/browse/YARN-9821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9821-001.patch, YARN-9821-002.patch > > > NM hangs at serviceStop when ATSV2 Backend Hbase is Down. > {code} > "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting > for monitor entry [0x7f5f1f29b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249) > - waiting to lock <0x0006c834d148> (a > org.apache.hadoop.hbase.client.BufferedMutatorImpl) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05808> (a java.lang.Object) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:247) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244) > at > org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05890> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c058f8> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330) > - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c059a8> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05a98> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05c88> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552) > > > "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 > nid=0x5fb7 in Object.wait() [0x7f5f23ad7000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:460) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258) > - locked <0x000784ee8220> (a > [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;) > at > org.apache.hadoop.hbase.
[jira] [Updated] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9821: Attachment: YARN-9821-002.patch > NM hangs at serviceStop when ATSV2 Backend Hbase is Down > - > > Key: YARN-9821 > URL: https://issues.apache.org/jira/browse/YARN-9821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9821-001.patch, YARN-9821-002.patch > > > NM hangs at serviceStop when ATSV2 Backend Hbase is Down. > {code} > "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting > for monitor entry [0x7f5f1f29b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249) > - waiting to lock <0x0006c834d148> (a > org.apache.hadoop.hbase.client.BufferedMutatorImpl) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05808> (a java.lang.Object) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:247) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244) > at > org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05890> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c058f8> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330) > - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c059a8> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05a98> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05c88> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552) > > > "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 > nid=0x5fb7 in Object.wait() [0x7f5f23ad7000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:460) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258) > - locked <0x000784ee8220> (a > [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForFirstSuccessfullyCompletedTask(ResultBoundedCompletionService.java:214) > at > org.
[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925331#comment-16925331 ] Abhishek Modi commented on YARN-9821: - Thanks [~Prabhu Joseph] for the patch. Some minor comments: # Can we rename isHbaseUp => isStorageUp to make it more generic. # Can we log the exception too. Apart from these minor comments, it looks good to me. > NM hangs at serviceStop when ATSV2 Backend Hbase is Down > - > > Key: YARN-9821 > URL: https://issues.apache.org/jira/browse/YARN-9821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9821-001.patch > > > NM hangs at serviceStop when ATSV2 Backend Hbase is Down. > {code} > "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting > for monitor entry [0x7f5f1f29b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249) > - waiting to lock <0x0006c834d148> (a > org.apache.hadoop.hbase.client.BufferedMutatorImpl) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05808> (a java.lang.Object) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:247) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244) > at > org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05890> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c058f8> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330) > - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c059a8> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05a98> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05c88> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552) > > > "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 > nid=0x5fb7 in Object.wait() [0x7f5f23ad7000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:460) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258) > - locked <0x000784ee8220> (a > [Lorg.apache.hadoop.hbase.client.Resul
[jira] [Updated] (YARN-9349) When doTransition() method occurs exception, the log level practices are inconsistent
[ https://issues.apache.org/jira/browse/YARN-9349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anuhan Torgonshar updated YARN-9349: Flags: (was: Important) > When doTransition() method occurs exception, the log level practices are > inconsistent > - > > Key: YARN-9349 > URL: https://issues.apache.org/jira/browse/YARN-9349 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 3.1.0, 2.8.5 >Reporter: Anuhan Torgonshar >Priority: Major > Labels: easyfix > Fix For: 3.3.0 > > Attachments: YARN-9349.trunk.patch > > > There are *inconsistent* log level practices when code catches > *_InvalidStateTransitionException_* for _*doTransition()*_ method. > {code:java} > **WARN level** > /** > file path: > hadoop-2.8.5-src\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\src\main\java\org\apache\hadoop\yarn\server\nodemanager\containermanager\application\ApplicationImpl.java > log statement line number: 482 > log level:warn > **/ > try { >// queue event requesting init of the same app >newState = stateMachine.doTransition(event.getType(), event); > } catch (InvalidStateTransitionException e) { >LOG.warn("Can't handle this event at current state", e); > } > /** > file path: > hadoop-2.8.5-src\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\src\main\java\org\apache\hadoop\yarn\server\nodemanager\containermanager\localizer\LocalizedResource.java > log statement line number: 200 > log level:warn > **/ > try { >newState = this.stateMachine.doTransition(event.getType(), event); > } catch (InvalidStateTransitionException e) { >LOG.warn("Can't handle this event at current state", e); > } > /** > file path: > hadoop-2.8.5-src\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\src\main\java\org\apache\hadoop\yarn\server\nodemanager\containermanager\container\ContainerImpl.java > log statement line number: 1156 > log level:warn > **/ > try { > newState = > stateMachine.doTransition(event.getType(), event); > } catch (InvalidStateTransitionException e) { > LOG.warn("Can't handle this event at current state: Current: [" > + oldState + "], eventType: [" + event.getType() + "]", e); > } > **ERROR level* > /** > file path: > hadoop-2.8.5-src\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-resourcemanager\src\main\java\org\apache\hadoop\yarn\server\resourcemanager\rmapp\attempt\RMAppAttemptImpl.java > log statement line number:878 > log level: error > **/ > try { >/* keep the master in sync with the state machine */ >this.stateMachine.doTransition(event.getType(), event); > } catch (InvalidStateTransitionException e) { >LOG.error("App attempt: " + appAttemptID >+ " can't handle this event at current state", e); >onInvalidTranstion(event.getType(), oldState); > } > /** > file path: > hadoop-2.8.5-src\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-resourcemanager\src\main\java\org\apache\hadoop\yarn\server\resourcemanager\rmnode\RMNodeImpl.java > log statement line number:623 > log level: error > **/ > try { >stateMachine.doTransition(event.getType(), event); > } catch (InvalidStateTransitionException e) { >LOG.error("Can't handle this event at current state", e); >LOG.error("Invalid event " + event.getType() + >" on Node " + this.nodeId); > } > > //There are 8 similar code snippets with ERROR log level. > {code} > After had a look on whole project, I found that there are 8 similar code > snippets assgin the ERROR level, when doTransition() ocurrs > *InvalidStateTransitionException*. And there are just 3 places choose the > WARN level when in same situations. Therefor, I think these 3 log statements > should be assigned ERROR level to keep consistent with other code snippets. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925326#comment-16925326 ] Rohith Sharma K S commented on YARN-9820: - I agree with [~jhung] approach. We should send notifyApp flag so that RMstateStore decide to trigger an event or not. > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9820-001.patch, YARN-9820-002.patch > > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9612) Support using ip to register NodeID
[ https://issues.apache.org/jira/browse/YARN-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925321#comment-16925321 ] Zhankun Tang commented on YARN-9612: [~cane], the background and the motivation still not clear to me. :) > Support using ip to register NodeID > --- > > Key: YARN-9612 > URL: https://issues.apache.org/jira/browse/YARN-9612 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Priority: Major > > In the environment like k8s. We should support ip when register NodeID with > RM since the hostname will be podName which can not be be resolved by DNS of > k8s -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA
[ https://issues.apache.org/jira/browse/YARN-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925317#comment-16925317 ] Zhankun Tang commented on YARN-9605: [~cane], Thanks for contributing this. I saw there're failures in the Jenkins result. Could you please try to fix them? > Add ZkConfiguredFailoverProxyProvider for RM HA > --- > > Key: YARN-9605 > URL: https://issues.apache.org/jira/browse/YARN-9605 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-9605.001.patch > > > In this issue, i will track a new feature to support > ZkConfiguredFailoverProxyProvider for RM HA -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925318#comment-16925318 ] Rohith Sharma K S commented on YARN-9821: - patch looks reasonable to me.. +1. > NM hangs at serviceStop when ATSV2 Backend Hbase is Down > - > > Key: YARN-9821 > URL: https://issues.apache.org/jira/browse/YARN-9821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9821-001.patch > > > NM hangs at serviceStop when ATSV2 Backend Hbase is Down. > {code} > "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting > for monitor entry [0x7f5f1f29b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249) > - waiting to lock <0x0006c834d148> (a > org.apache.hadoop.hbase.client.BufferedMutatorImpl) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05808> (a java.lang.Object) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:247) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244) > at > org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05890> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c058f8> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330) > - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c059a8> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05a98> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05c88> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552) > > > "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 > nid=0x5fb7 in Object.wait() [0x7f5f23ad7000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:460) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258) > - locked <0x000784ee8220> (a > [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForFirstSuccessfullyCompletedTask(ResultBoundedC
[jira] [Commented] (YARN-9739) appsTableData in AppsBlock may cause OOM
[ https://issues.apache.org/jira/browse/YARN-9739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925308#comment-16925308 ] Zhankun Tang commented on YARN-9739: [~cane], Thanks for catching this point. Do you mean we should make this a cache to serve multiple user's request? > appsTableData in AppsBlock may cause OOM > > > Key: YARN-9739 > URL: https://issues.apache.org/jira/browse/YARN-9739 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: zhoukang >Priority: Major > Attachments: heap0.png, heap1.png, stack.png > > > If we have many users list the applications, it may cause RM OOM -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9764) Print application submission context label in application summary
[ https://issues.apache.org/jira/browse/YARN-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925301#comment-16925301 ] Hudson commented on YARN-9764: -- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17255 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17255/]) YARN-9764. Print application submission context label in application (jhung: rev 43e389b9801e09741fdf78fef067b8ac60f691c8) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java > Print application submission context label in application summary > - > > Key: YARN-9764 > URL: https://issues.apache.org/jira/browse/YARN-9764 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Manoj Kumar >Priority: Major > Labels: release-blocker > Attachments: YARN-9764.01.patch, YARN-9764.02.patch, > YARN-9764.branch-2.01.patch, YARN-9764.branch-2.02.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9764) Print application submission context label in application summary
[ https://issues.apache.org/jira/browse/YARN-9764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-9764: Attachment: YARN-9764.branch-2.02.patch > Print application submission context label in application summary > - > > Key: YARN-9764 > URL: https://issues.apache.org/jira/browse/YARN-9764 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Hung >Assignee: Manoj Kumar >Priority: Major > Labels: release-blocker > Attachments: YARN-9764.01.patch, YARN-9764.02.patch, > YARN-9764.branch-2.01.patch, YARN-9764.branch-2.02.patch > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925290#comment-16925290 ] Jonathan Hung edited comment on YARN-9820 at 9/9/19 1:46 AM: - Thanks for catching this. Perhaps we can implement it a different way. We can add a new {noformat} public RMStateUpdateAppEvent(ApplicationStateData appState, boolean notifyApplication) {{noformat} constructor to RMStateUpdateAppEvent and a new method {noformat} public void updateApplicationState(ApplicationStateData appState, boolean notifyApp) { {noformat} to RMStateStore, then call this new method in RMAppImpl#AttemptLaunchedTransition instead of updateApplicationState(ApplicationStateData). Previously we send an event on every app launch; with this approach we can avoid sending these unnecessary events only to ignore them later. Thoughts? was (Author: jhung): Thanks for catching this. Perhaps we can implement it a different way. We can add a new {noformat} public RMStateUpdateAppEvent(ApplicationStateData appState, boolean notifyApplication) {{noformat} constructor to RMStateUpdateAppEvent and a new method {noformat} public void updateApplicationState(ApplicationStateData appState, boolean notifyApp) { {noformat} > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9820-001.patch, YARN-9820-002.patch > > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925290#comment-16925290 ] Jonathan Hung edited comment on YARN-9820 at 9/9/19 1:46 AM: - Thanks for catching this. Perhaps we can implement it a different way. We can add a new {noformat} public RMStateUpdateAppEvent(ApplicationStateData appState, boolean notifyApplication) {{noformat} constructor to RMStateUpdateAppEvent and a new method {noformat} public void updateApplicationState(ApplicationStateData appState, boolean notifyApp) { {noformat} to RMStateStore, then call this new method with notifyApp = false in RMAppImpl#AttemptLaunchedTransition instead of updateApplicationState(ApplicationStateData). Previously we send an event on every app launch; with this approach we can avoid sending these unnecessary events only to ignore them later. Thoughts? was (Author: jhung): Thanks for catching this. Perhaps we can implement it a different way. We can add a new {noformat} public RMStateUpdateAppEvent(ApplicationStateData appState, boolean notifyApplication) {{noformat} constructor to RMStateUpdateAppEvent and a new method {noformat} public void updateApplicationState(ApplicationStateData appState, boolean notifyApp) { {noformat} to RMStateStore, then call this new method in RMAppImpl#AttemptLaunchedTransition instead of updateApplicationState(ApplicationStateData). Previously we send an event on every app launch; with this approach we can avoid sending these unnecessary events only to ignore them later. Thoughts? > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9820-001.patch, YARN-9820-002.patch > > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925290#comment-16925290 ] Jonathan Hung commented on YARN-9820: - Thanks for catching this. Perhaps we can implement it a different way. We can add a new {noformat} public RMStateUpdateAppEvent(ApplicationStateData appState, boolean notifyApplication) {{noformat} constructor to RMStateUpdateAppEvent and a new method {noformat} public void updateApplicationState(ApplicationStateData appState, boolean notifyApp) { {noformat} > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9820-001.patch, YARN-9820-002.patch > > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925237#comment-16925237 ] Hadoop QA commented on YARN-9820: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 59s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 57s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 42s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 86m 40s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 38s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | YARN-9820 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12979781/YARN-9820-002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 633134f7739a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ca32917 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24771/testReport/ | | Max. process+thread count | 792 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24771/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > RM logs InvalidStateTransitionExceptio
[jira] [Commented] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError
[ https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925236#comment-16925236 ] Hadoop QA commented on YARN-9816: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 11s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 13s{color} | {color:green} hadoop-yarn-server-timeline-pluginstorage in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 50m 10s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | YARN-9816 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12979784/YARN-9816-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f1ec3d3a7caa 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ca32917 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24772/testReport/ | | Max. process+thread count | 413 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timeline-pluginstorage | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/24772/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > EntityGroupFSTime
[jira] [Updated] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError
[ https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9816: Attachment: YARN-9816-001.patch > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError > --- > > Key: YARN-9816 > URL: https://issues.apache.org/jira/browse/YARN-9816 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9816-001.patch > > > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError. > This happens when a file is present under /ats/active. > {code} > [hdfs@node2 yarn]$ hadoop fs -ls /ats/active > Found 1 items > -rw-r--r-- 3 hdfs hadoop 0 2019-09-06 16:34 > /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0 > {code} > Error Message: > {code:java} > java.lang.StackOverflowError > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) > at com.sun.proxy.$Proxy15.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > {code} > One of our user has tried to distcp hdfs://ats/active dir. Distcp job has > created the > temp file .distcp.tmp.attempt_155759136_39768_m_01_0 and failed to > delete at end which has caused the
[jira] [Updated] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError
[ https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9816: Affects Version/s: 3.1.0 3.2.0 > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError > --- > > Key: YARN-9816 > URL: https://issues.apache.org/jira/browse/YARN-9816 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 3.1.0, 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9816-001.patch > > > EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError. > This happens when a file is present under /ats/active. > {code} > [hdfs@node2 yarn]$ hadoop fs -ls /ats/active > Found 1 items > -rw-r--r-- 3 hdfs hadoop 0 2019-09-06 16:34 > /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0 > {code} > Error Message: > {code:java} > java.lang.StackOverflowError > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) > at com.sun.proxy.$Proxy15.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088) > at > org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > at > org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) > {code} > One of our user has tried to distcp hdfs://ats/active dir. Distcp job has > created the > temp file .distcp.tmp.attempt_155759136_39768_m_01_0 and failed to > delete at en
[jira] [Updated] (YARN-9816) EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError
[ https://issues.apache.org/jira/browse/YARN-9816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9816: Description: EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError. This happens when a file is present under /ats/active. {code} [hdfs@node2 yarn]$ hadoop fs -ls /ats/active Found 1 items -rw-r--r-- 3 hdfs hadoop 0 2019-09-06 16:34 /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0 {code} Error Message: {code:java} java.lang.StackOverflowError at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185) at com.sun.proxy.$Proxy15.getListing(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:2143) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1076) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1088) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1059) at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1038) at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1034) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusIterator(DistributedFileSystem.java:1046) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.list(EntityGroupFSTimelineStore.java:398) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:368) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) at org.apache.hadoop.yarn.server.timeline.EntityGroupFSTimelineStore.scanActiveLogs(EntityGroupFSTimelineStore.java:383) {code} One of our user has tried to distcp hdfs://ats/active dir. Distcp job has created the temp file .distcp.tmp.attempt_155759136_39768_m_01_0 and failed to delete at end which has caused the crash of EntityLogScanner Thread with StackOverflowError. was: EntityGroupFSTimelineStore#scanActiveLogs fails with StackOverflowError. This happens when an Invalid applicationDir is present in /ats/active. {code} [hdfs@node2 yarn]$ hadoop fs -ls /ats/active Found 1 items -rw-r--r-- 3 hdfs hadoop 0 2019-09-06 16:34 /ats/active/.distcp.tmp.attempt_155759136_39768_m_01_0 {code} Error Message: {code:java} java.lang.StackOverflowError at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:632) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMetho
[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925219#comment-16925219 ] Prabhu Joseph commented on YARN-9821: - [~abmodi] Can you review this Jira when you get time. This Fixes NodeManager getting blocked at serviceStop when ATSV2 backend Hbase is Down. > NM hangs at serviceStop when ATSV2 Backend Hbase is Down > - > > Key: YARN-9821 > URL: https://issues.apache.org/jira/browse/YARN-9821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9821-001.patch > > > NM hangs at serviceStop when ATSV2 Backend Hbase is Down. > {code} > "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting > for monitor entry [0x7f5f1f29b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249) > - waiting to lock <0x0006c834d148> (a > org.apache.hadoop.hbase.client.BufferedMutatorImpl) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05808> (a java.lang.Object) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:247) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244) > at > org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05890> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c058f8> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330) > - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c059a8> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05a98> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05c88> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552) > > > "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 > nid=0x5fb7 in Object.wait() [0x7f5f23ad7000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:460) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258) > - locked <0x000784ee8220> (a > [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;) > at > org.apache.h
[jira] [Updated] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9820: Attachment: YARN-9820-002.patch > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9820-001.patch, YARN-9820-002.patch > > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925210#comment-16925210 ] Hadoop QA commented on YARN-9820: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 23s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 36s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 102 unchanged - 0 fixed = 103 total (was 102) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 55s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 84m 59s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}141m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | YARN-9820 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12979776/YARN-9820-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e0457c1a6d59 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ca32917 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/24769/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24769/testReport/ | | Max. process+thread count | 813 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-
[jira] [Commented] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925207#comment-16925207 ] Hadoop QA commented on YARN-9821: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 30s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 0s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s{color} | {color:green} hadoop-yarn-server-timelineservice-hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 49m 56s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.2 Server=19.03.2 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | YARN-9821 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12979778/YARN-9821-001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 64a92ab10bad 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / ca32917 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_212 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/24770/testReport/ | | Max. process+thread count | 307 (vs. ulimit of 5500) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase
[jira] [Updated] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9821: Attachment: YARN-9821-001.patch > NM hangs at serviceStop when ATSV2 Backend Hbase is Down > - > > Key: YARN-9821 > URL: https://issues.apache.org/jira/browse/YARN-9821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: YARN-9821-001.patch > > > NM hangs at serviceStop when ATSV2 Backend Hbase is Down. > {code} > "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting > for monitor entry [0x7f5f1f29b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249) > - waiting to lock <0x0006c834d148> (a > org.apache.hadoop.hbase.client.BufferedMutatorImpl) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05808> (a java.lang.Object) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:247) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244) > at > org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05890> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c058f8> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330) > - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c059a8> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05a98> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05c88> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552) > > > "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 > nid=0x5fb7 in Object.wait() [0x7f5f23ad7000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:460) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258) > - locked <0x000784ee8220> (a > [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForFirstSuccessfullyCompletedTask(ResultBoundedCompletionService.java:214) > at > org.apache.hadoop.hbase.c
[jira] [Updated] (YARN-9822) TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down.
[ https://issues.apache.org/jira/browse/YARN-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9822: Parent: YARN-9802 Issue Type: Sub-task (was: Bug) > TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down. > - > > Key: YARN-9822 > URL: https://issues.apache.org/jira/browse/YARN-9822 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down. > YARN-9374 prevents the threads getting blocked when it has already identified > that Hbase down before accessing Hbase. TimelineCollector can check if the > Writer Backend is up or down before locking the writer. > {code} > synchronized (writer) { > response = writeTimelineEntities(entities, callerUgi); > flushBufferedTimelineEntities(); > } > {code} > {code} > "qtp183259297-80" #80 daemon prio=5 os_prio=0 tid=0x7f5f567fd000 > nid=0x5fbb waiting for monitor entry [0x7f5f236d4000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.putEntities(TimelineCollector.java:164) > - waiting to lock <0x0006c7c05770> (a > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService.putEntities(TimelineCollectorWebService.java:186) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) > at > com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) > at > com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) > at > com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) > at > com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) > at > com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) > at > com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) > at > org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:304) > at > org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1624) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:175
[jira] [Updated] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
[ https://issues.apache.org/jira/browse/YARN-9821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9821: Parent: YARN-9802 Issue Type: Sub-task (was: Bug) > NM hangs at serviceStop when ATSV2 Backend Hbase is Down > - > > Key: YARN-9821 > URL: https://issues.apache.org/jira/browse/YARN-9821 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > NM hangs at serviceStop when ATSV2 Backend Hbase is Down. > {code} > "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting > for monitor entry [0x7f5f1f29b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249) > - waiting to lock <0x0006c834d148> (a > org.apache.hadoop.hbase.client.BufferedMutatorImpl) > at > org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62) > at > org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05808> (a java.lang.Object) > at > org.apache.hadoop.service.AbstractService.close(AbstractService.java:247) > at > org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244) > at > org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05890> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c058f8> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330) > - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c059a8> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05a98> (a java.lang.Object) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) > - locked <0x0006c7c05c88> (a java.lang.Object) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552) > > > "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 > nid=0x5fb7 in Object.wait() [0x7f5f23ad7000] >java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:460) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258) > - locked <0x000784ee8220> (a > [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForFirstSuccessfullyCompletedTask(ResultBoundedCompletionService.java:214) > at > org.apache.hadoop.hbase.client.ScannerCalla
[jira] [Assigned] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph reassigned YARN-9820: --- Assignee: Prabhu Joseph > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Assignee: Prabhu Joseph >Priority: Critical > Attachments: YARN-9820-001.patch > > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925179#comment-16925179 ] Prabhu Joseph commented on YARN-9820: - As per the YARN-9438 patch, looks we can ignore the {{APP_UPDATE_SAVED}} event when the app is in {{ACCEPTED}} state. [~jhung] [~haibo.chen] Can you review this Jira when you get time. Thanks. > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Priority: Critical > Attachments: YARN-9820-001.patch > > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated YARN-9820: Attachment: YARN-9820-001.patch > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Priority: Critical > Attachments: YARN-9820-001.patch > > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9822) TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down.
Prabhu Joseph created YARN-9822: --- Summary: TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down. Key: YARN-9822 URL: https://issues.apache.org/jira/browse/YARN-9822 Project: Hadoop YARN Issue Type: Bug Components: ATSv2 Affects Versions: 3.2.0, 3.3.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down. YARN-9374 prevents the threads getting blocked when it has already identified that Hbase down before accessing Hbase. TimelineCollector can check if the Writer Backend is up or down before locking the writer. {code} synchronized (writer) { response = writeTimelineEntities(entities, callerUgi); flushBufferedTimelineEntities(); } {code} {code} "qtp183259297-80" #80 daemon prio=5 os_prio=0 tid=0x7f5f567fd000 nid=0x5fbb waiting for monitor entry [0x7f5f236d4000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollector.putEntities(TimelineCollector.java:164) - waiting to lock <0x0006c7c05770> (a org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl) at org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService.putEntities(TimelineCollectorWebService.java:186) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:644) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:304) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:592) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1624) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
[jira] [Created] (YARN-9821) NM hangs at serviceStop when ATSV2 Backend Hbase is Down
Prabhu Joseph created YARN-9821: --- Summary: NM hangs at serviceStop when ATSV2 Backend Hbase is Down Key: YARN-9821 URL: https://issues.apache.org/jira/browse/YARN-9821 Project: Hadoop YARN Issue Type: Bug Components: ATSv2 Affects Versions: 3.2.0, 3.3.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph NM hangs at serviceStop when ATSV2 Backend Hbase is Down. {code} "Thread-197" #302 prio=5 os_prio=0 tid=0x7f5f389ba000 nid=0x631d waiting for monitor entry [0x7f5f1f29b000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.close(BufferedMutatorImpl.java:249) - waiting to lock <0x0006c834d148> (a org.apache.hadoop.hbase.client.BufferedMutatorImpl) at org.apache.hadoop.yarn.server.timelineservice.storage.common.TypedBufferedMutator.close(TypedBufferedMutator.java:62) at org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl.serviceStop(HBaseTimelineWriterImpl.java:636) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) - locked <0x0006c7c05808> (a java.lang.Object) at org.apache.hadoop.service.AbstractService.close(AbstractService.java:247) at org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorManager.serviceStop(TimelineCollectorManager.java:244) at org.apache.hadoop.yarn.server.timelineservice.collector.NodeTimelineCollectorManager.serviceStop(NodeTimelineCollectorManager.java:164) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) - locked <0x0006c7c05890> (a java.lang.Object) at org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService.serviceStop(PerNodeTimelineCollectorsAuxService.java:113) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) - locked <0x0006c7c058f8> (a java.lang.Object) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceStop(AuxServices.java:330) - locked <0x0006c7c23400> (a java.util.Collections$SynchronizedMap) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) - locked <0x0006c7c059a8> (a java.lang.Object) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStop(ContainerManagerImpl.java:720) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) - locked <0x0006c7c05a98> (a java.lang.Object) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:54) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:102) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:158) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:132) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:526) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:220) - locked <0x0006c7c05c88> (a java.lang.Object) at org.apache.hadoop.yarn.server.nodemanager.NodeManager$1.run(NodeManager.java:552) "qtp183259297-76" #76 daemon prio=5 os_prio=0 tid=0x7f5f567ed000 nid=0x5fb7 in Object.wait() [0x7f5f23ad7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:460) at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) at org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForSpecificCompletedTask(ResultBoundedCompletionService.java:258) - locked <0x000784ee8220> (a [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;) at org.apache.hadoop.hbase.client.ResultBoundedCompletionService.pollForFirstSuccessfullyCompletedTask(ResultBoundedCompletionService.java:214) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:228) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192) at org.apache.hadoop.hbase.client.ClientScanner.c
[jira] [Updated] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-9820: Target Version/s: (was: 3.2.2) > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Priority: Critical > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-9820: Affects Version/s: (was: 3.2.1) > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Priority: Critical > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-9820: Target Version/s: 3.2.2 > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.1 >Reporter: Rohith Sharma K S >Priority: Critical > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925135#comment-16925135 ] Rohith Sharma K S commented on YARN-9820: - YARN-9438 cause triggering update event immediately after app submit. It is expecting event, then this need to be ignored in RMAppImpl. cc:/ [~jhung] [~haibochen] > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.1 >Reporter: Rohith Sharma K S >Priority: Critical > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
[ https://issues.apache.org/jira/browse/YARN-9820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith Sharma K S updated YARN-9820: Affects Version/s: 3.2.1 > RM logs InvalidStateTransitionException when app is submitted > - > > Key: YARN-9820 > URL: https://issues.apache.org/jira/browse/YARN-9820 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.1 >Reporter: Rohith Sharma K S >Priority: Critical > > It is observed that RM logs InvalidStateTransitionException. Not sure what is > the impact but its better to handle it. > {noformat} > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED > on event = LAUNCHED > 2019-09-08 12:40:46,327 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the > launch time for applicationId: application_1567926390667_0001, attemptId: > appattempt_1567926390667_0001_01launchTime: 1567926646327 > 2019-09-08 12:40:46,328 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating > info for app: application_1567926390667_0001 > 2019-09-08 12:40:46,332 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: > application_1567926390667_0001 can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > APP_UPDATE_SAVED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9820) RM logs InvalidStateTransitionException when app is submitted
Rohith Sharma K S created YARN-9820: --- Summary: RM logs InvalidStateTransitionException when app is submitted Key: YARN-9820 URL: https://issues.apache.org/jira/browse/YARN-9820 Project: Hadoop YARN Issue Type: Bug Reporter: Rohith Sharma K S It is observed that RM logs InvalidStateTransitionException. Not sure what is the impact but its better to handle it. {noformat} 2019-09-08 12:40:46,327 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1567926390667_0001_01 State change from ALLOCATED to LAUNCHED on event = LAUNCHED 2019-09-08 12:40:46,327 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the launch time for applicationId: application_1567926390667_0001, attemptId: appattempt_1567926390667_0001_01launchTime: 1567926646327 2019-09-08 12:40:46,328 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1567926390667_0001 2019-09-08 12:40:46,332 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: App: application_1567926390667_0001 can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: APP_UPDATE_SAVED at ACCEPTED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:881) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1030) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:1014) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:219) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:133) at java.lang.Thread.run(Thread.java:748) {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9605) Add ZkConfiguredFailoverProxyProvider for RM HA
[ https://issues.apache.org/jira/browse/YARN-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925103#comment-16925103 ] zhoukang commented on YARN-9605: [~Prabhu Joseph][~tangzhankun]Could help review this patch plz?Thanks a lot > Add ZkConfiguredFailoverProxyProvider for RM HA > --- > > Key: YARN-9605 > URL: https://issues.apache.org/jira/browse/YARN-9605 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Fix For: 3.2.0, 3.1.2 > > Attachments: YARN-9605.001.patch > > > In this issue, i will track a new feature to support > ZkConfiguredFailoverProxyProvider for RM HA -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9612) Support using ip to register NodeID
[ https://issues.apache.org/jira/browse/YARN-9612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925102#comment-16925102 ] zhoukang commented on YARN-9612: IIUC. The solution is that add service name for each pod? [~tangzhankun]Which i think is not very elegant. > Support using ip to register NodeID > --- > > Key: YARN-9612 > URL: https://issues.apache.org/jira/browse/YARN-9612 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: zhoukang >Priority: Major > > In the environment like k8s. We should support ip when register NodeID with > RM since the hostname will be podName which can not be be resolved by DNS of > k8s -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9739) appsTableData in AppsBlock may cause OOM
[ https://issues.apache.org/jira/browse/YARN-9739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925100#comment-16925100 ] zhoukang commented on YARN-9739: Any suggestion for this [~tangzhankun]Our current implementation is just cache for that which i think is not elegant enough > appsTableData in AppsBlock may cause OOM > > > Key: YARN-9739 > URL: https://issues.apache.org/jira/browse/YARN-9739 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: zhoukang >Priority: Major > Attachments: heap0.png, heap1.png, stack.png > > > If we have many users list the applications, it may cause RM OOM -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9537) Add configuration to disable AM preemption
[ https://issues.apache.org/jira/browse/YARN-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925101#comment-16925101 ] zhoukang commented on YARN-9537: Yes, i will. [~yufeigu]thanks a lot! > Add configuration to disable AM preemption > -- > > Key: YARN-9537 > URL: https://issues.apache.org/jira/browse/YARN-9537 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.2.0, 3.1.2 >Reporter: zhoukang >Assignee: zhoukang >Priority: Major > Attachments: YARN-9537.001.patch > > > In this issue, i will add a configuration to support disable AM preemption. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9748) Allow capacity-scheduler configuration on HDFS
[ https://issues.apache.org/jira/browse/YARN-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925099#comment-16925099 ] zhoukang commented on YARN-9748: Sorry for late reply [~Prabhu Joseph]I think the title is miseading. what we want in our production cluster is auto-reload feature, maybe i should change the title ? > Allow capacity-scheduler configuration on HDFS > -- > > Key: YARN-9748 > URL: https://issues.apache.org/jira/browse/YARN-9748 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, capacityscheduler >Affects Versions: 3.1.2 >Reporter: zhoukang >Assignee: Prabhu Joseph >Priority: Major > > Improvement: > Support auto reload from hdfs -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9748) Allow capacity-scheduler configuration on HDFS and support reload from hdfs
[ https://issues.apache.org/jira/browse/YARN-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-9748: --- Summary: Allow capacity-scheduler configuration on HDFS and support reload from hdfs (was: Allow capacity-scheduler configuration on HDFS) > Allow capacity-scheduler configuration on HDFS and support reload from hdfs > --- > > Key: YARN-9748 > URL: https://issues.apache.org/jira/browse/YARN-9748 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, capacityscheduler >Affects Versions: 3.1.2 >Reporter: zhoukang >Assignee: Prabhu Joseph >Priority: Major > > Improvement: > Support auto reload from hdfs -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9748) Allow capacity-scheduler configuration on HDFS and support reload from HDFS
[ https://issues.apache.org/jira/browse/YARN-9748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated YARN-9748: --- Summary: Allow capacity-scheduler configuration on HDFS and support reload from HDFS (was: Allow capacity-scheduler configuration on HDFS and support reload from hdfs) > Allow capacity-scheduler configuration on HDFS and support reload from HDFS > --- > > Key: YARN-9748 > URL: https://issues.apache.org/jira/browse/YARN-9748 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, capacityscheduler >Affects Versions: 3.1.2 >Reporter: zhoukang >Assignee: Prabhu Joseph >Priority: Major > > Improvement: > Support auto reload from hdfs -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8199) Logging fileSize of log files under NM Local Dir
[ https://issues.apache.org/jira/browse/YARN-8199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925094#comment-16925094 ] Prabhu Joseph commented on YARN-8199: - [~rohithsharma] Below is the commit id. We missed the jira number in commit message. {code} commit 54ac80176e8487b7a18cd9e16a11efa289d0b7df Author: Szilard Nemeth Date: Fri Aug 2 13:38:06 2019 +0200 Logging fileSize of log files under NM Local Dir. Contributed by Prabhu Joseph {code} > Logging fileSize of log files under NM Local Dir > > > Key: YARN-8199 > URL: https://issues.apache.org/jira/browse/YARN-8199 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: supportability > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: 0001-YARN-8199.patch, 0002-YARN-8199.patch, > YARN-8199-003.patch, YARN-8199-004.patch, YARN-8199-branch-3.1.001.patch, > YARN-8199-branch-3.2.001.patch > > > Logging fileSize of log files like syslog, stderr, stdout under NM Local Dir > by NodeManager before the cleanup will help to find the application which has > written too verbose. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9787) Typo in analysesErrorMsg
[ https://issues.apache.org/jira/browse/YARN-9787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16925092#comment-16925092 ] kevin su commented on YARN-9787: Thanks for [~surendrasingh] and [~jojochuang] for the review and commit > Typo in analysesErrorMsg > > > Key: YARN-9787 > URL: https://issues.apache.org/jira/browse/YARN-9787 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: kevin su >Priority: Trivial > Labels: newbie, noob > Fix For: 3.3.0 > > Attachments: YARN-9787.001.patch > > > {code:java} > analysis.append("Please check whether your etc/hadoop/mapred-site.xml " > + "contains the below configuration:\n"); > {code} > I think it should be {{/etc/hadoop/mapred-site.xml}} > https://github.com/apache/hadoop/blob/2064ca015d1584263aac0cc20c60b925a3aff612/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java#L788-L789 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org