[jira] [Created] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
lujie created YARN-6948: --- Summary: Invalid event: ATTEMPT_ADDED at FINAL_SAVING Key: YARN-6948 URL: https://issues.apache.org/jira/browse/YARN-6948 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.0 Reporter: lujie When I send kill command to a running job, I check the logs and find the Exception: {code:java} 2017-08-03 01:35:20,485 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: ATTEMPT_ADDED at FINAL_SAVING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6950) Invalid event: LAUNCH_FAILED at FAILED
lujie created YARN-6950: --- Summary: Invalid event: LAUNCH_FAILED at FAILED Key: YARN-6950 URL: https://issues.apache.org/jira/browse/YARN-6950 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.6.0 Reporter: lujie A RMAppAttemptImpl fail due to some reason,meanwhile AM fails to launch a container and send event LAUNCH_FAILED,and the StateMachine can not handle it: {code:java} 2017-07-05 03:33:09,013 ERROR org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: LAUNCH_FAILED at FAILED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6949) Invalid event: LOCALIZED at LOCALIZED
lujie created YARN-6949: --- Summary: Invalid event: LOCALIZED at LOCALIZED Key: YARN-6949 URL: https://issues.apache.org/jira/browse/YARN-6949 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.0 Reporter: lujie When job is running, I stop a nodemanager in one machine due to some reason, Then I check the logs to see the running state,I find many InvalidStateTransitionException: {code:java} rg.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: LOCALIZATION_FAILED at LOCALIZED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource.handle(LocalizedResource.java:198) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.handle(LocalResourcesTrackerImpl.java:194) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.handle(LocalResourcesTrackerImpl.java:58) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1058) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:720) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:355) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6949) Invalid event: LOCALIZATION_FAILED at LOCALIZED
[ https://issues.apache.org/jira/browse/YARN-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115220#comment-16115220 ] lujie commented on YARN-6949: - I check the log and also find some NullPointerException: {code:java} ava.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:505) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1131) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1093) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:720) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:355) at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) {code} > Invalid event: LOCALIZATION_FAILED at LOCALIZED > --- > > Key: YARN-6949 > URL: https://issues.apache.org/jira/browse/YARN-6949 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0 >Reporter: lujie > > When job is running, I stop a nodemanager in one machine due to some reason, > Then I check the logs to see the running state,I find many > InvalidStateTransitionException: > {code:java} > rg.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > LOCALIZATION_FAILED at LOCALIZED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource.handle(LocalizedResource.java:198) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.handle(LocalResourcesTrackerImpl.java:194) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.handle(LocalResourcesTrackerImpl.java:58) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1058) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:720) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:355) > at > org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48) > at > org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115988#comment-16115988 ] lujie commented on YARN-6948: - >From the actual logs. # RMAppImpl: application_1501695223072_0001 State change from NEW to NEW_SAVING # RMAppImpl: application_1501695223072_0001 State change from SUBMITTED to ACCEPTED # RMAppAttemptImpl: appattempt_1501695223072_0001_01 State change from NEW to SUBMITTED # RMAppImpl: application_1501695223072_0001 State change from ACCEPTED to KILLING # CapacityScheduler: Added Application Attempt appattempt_1501695223072_0001_01 to scheduler from user lujie in queue default # RMAppAttemptImpl: appattempt_1501695223072_0001_01 State change from SUBMITTED to FINAL_SAVING # RMAppAttemptImpl: Invalid event: ATTEMPT_ADDED at FINAL_SAVING > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0 >Reporter: lujie > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed
[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7176: Description: I submit a job, but I need to kill it immediately due to some reason. Then I found the RM was killed. I check the RMLog and found ArrayIndexOutOfBoundsException and NullPointerException.According to the log,RM was killed due to NullPointerException, but i still don't understand why those Exception happen I attath the whole RM log. {code:java} 2017-09-08 02:34:37,967 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1504809243340_0001_01. Got exception: java.lang.ArrayIndexOutOfBoundsException: 3 at java.util.ArrayList.add(ArrayList.java:441) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2017-09-08 02:34:37,968 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating app: application_1504809243340_0001 java.lang.NullPointerException at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816) at com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426) at
[jira] [Updated] (YARN-7176) After kill command is send, the job hangs
[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7176: Attachment: logs.rar > After kill command is send, the job hangs > -- > > Key: YARN-7176 > URL: https://issues.apache.org/jira/browse/YARN-7176 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 2.6.0 >Reporter: lujie >Priority: Critical > Attachments: logs.rar > > > I submit a job, but i need to kill it immediately due to some reason. Then I > found the job is hang, > I check the log and found ArrayIndexOutOfBoundsException and > NullPointerException in RMLog: > {code:java} > 2017-09-08 02:34:37,967 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error > launching appattempt_1504809243340_0001_01. Got exception: > java.lang.ArrayIndexOutOfBoundsException: 3 > at java.util.ArrayList.add(ArrayList.java:441) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) > at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2017-09-08 02:34:37,968 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating app: application_1504809243340_0001 > java.lang.NullPointerException > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at >
[jira] [Updated] (YARN-7176) After kill command is send, the job hangs
[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7176: [^C:\Users\Administrator\Desktop\logs.zip] > After kill command is send, the job hangs > -- > > Key: YARN-7176 > URL: https://issues.apache.org/jira/browse/YARN-7176 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 2.6.0 >Reporter: lujie >Priority: Critical > > I submit a job, but i need to kill it immediately due to some reason. Then I > found the job is hang, > I check the log and found ArrayIndexOutOfBoundsException and > NullPointerException in RMLog: > {code:java} > 2017-09-08 02:34:37,967 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error > launching appattempt_1504809243340_0001_01. Got exception: > java.lang.ArrayIndexOutOfBoundsException: 3 > at java.util.ArrayList.add(ArrayList.java:441) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) > at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2017-09-08 02:34:37,968 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating app: application_1504809243340_0001 > java.lang.NullPointerException > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at >
[jira] [Issue Comment Deleted] (YARN-7176) After kill command is send, the job hangs
[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7176: Comment: was deleted (was: [^C:\Users\Administrator\Desktop\logs.zip]) > After kill command is send, the job hangs > -- > > Key: YARN-7176 > URL: https://issues.apache.org/jira/browse/YARN-7176 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 2.6.0 >Reporter: lujie >Priority: Critical > > I submit a job, but i need to kill it immediately due to some reason. Then I > found the job is hang, > I check the log and found ArrayIndexOutOfBoundsException and > NullPointerException in RMLog: > {code:java} > 2017-09-08 02:34:37,967 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error > launching appattempt_1504809243340_0001_01. Got exception: > java.lang.ArrayIndexOutOfBoundsException: 3 > at java.util.ArrayList.add(ArrayList.java:441) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) > at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2017-09-08 02:34:37,968 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating app: application_1504809243340_0001 > java.lang.NullPointerException > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at >
[jira] [Created] (YARN-7176) After kill command is send, the job hangs
lujie created YARN-7176: --- Summary: After kill command is send, the job hangs Key: YARN-7176 URL: https://issues.apache.org/jira/browse/YARN-7176 Project: Hadoop YARN Issue Type: Bug Components: RM Affects Versions: 2.6.0 Reporter: lujie Priority: Critical I submit a job, but i need to kill it immediately due to some reason. Then I found the job is hang, I check the log and found ArrayIndexOutOfBoundsException and NullPointerException in RMLog: {code:java} 2017-09-08 02:34:37,967 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1504809243340_0001_01. Got exception: java.lang.ArrayIndexOutOfBoundsException: 3 at java.util.ArrayList.add(ArrayList.java:441) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2017-09-08 02:34:37,968 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating app: application_1504809243340_0001 java.lang.NullPointerException at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816) at com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62) at
[jira] [Updated] (YARN-7176) After kill command is send, the job hangs
[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7176: Description: I submit a job, but i need to kill it immediately due to some reason. Then I found the RM killed, I check the log and found ArrayIndexOutOfBoundsException and NullPointerException in RMLog: {code:java} 2017-09-08 02:34:37,967 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1504809243340_0001_01. Got exception: java.lang.ArrayIndexOutOfBoundsException: 3 at java.util.ArrayList.add(ArrayList.java:441) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2017-09-08 02:34:37,968 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating app: application_1504809243340_0001 java.lang.NullPointerException at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816) at com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:163) at
[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed
[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7176: Description: I submit a job, but i need to kill it immediately due to some reason. Then I found the RM was killed, I check the RMLog and found ArrayIndexOutOfBoundsException and NullPointerException : {code:java} 2017-09-08 02:34:37,967 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1504809243340_0001_01. Got exception: java.lang.ArrayIndexOutOfBoundsException: 3 at java.util.ArrayList.add(ArrayList.java:441) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2017-09-08 02:34:37,968 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating app: application_1504809243340_0001 java.lang.NullPointerException at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816) at com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:163) at
[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed
[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7176: Description: I submit a job, but i need to kill it immediately due to some reason. Then I found the RM was killed, I check the RMLog and found ArrayIndexOutOfBoundsException and NullPointerException : {code:java} 2017-09-08 02:34:37,967 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1504809243340_0001_01. Got exception: java.lang.ArrayIndexOutOfBoundsException: 3 at java.util.ArrayList.add(ArrayList.java:441) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2017-09-08 02:34:37,968 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating app: application_1504809243340_0001 java.lang.NullPointerException at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816) at com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:163) at
[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed
[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7176: Description: I submit a job, but I need to kill it immediately due to some reason. Then I found the RM was killed. I check the RMLog and found ArrayIndexOutOfBoundsException and NullPointerException.According to the log,RM was killed due to NullPointerException, but i still don't understand why those Exception happen {code:java} 2017-09-08 02:34:37,967 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1504809243340_0001_01. Got exception: java.lang.ArrayIndexOutOfBoundsException: 3 at java.util.ArrayList.add(ArrayList.java:441) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2017-09-08 02:34:37,968 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating app: application_1504809243340_0001 java.lang.NullPointerException at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816) at com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426) at
[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed
[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7176: Description: I submit a job, but I need to kill it immediately due to some reason. Then I found the RM was killed. I check the RMLog and found ArrayIndexOutOfBoundsException and NullPointerException.According to the log,RM was killed due to NullPointerException, but i still don't understand why those Exception happen I attath the whole RM log {code:java} 2017-09-08 02:34:37,967 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1504809243340_0001_01. Got exception: java.lang.ArrayIndexOutOfBoundsException: 3 at java.util.ArrayList.add(ArrayList.java:441) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2017-09-08 02:34:37,968 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating app: application_1504809243340_0001 java.lang.NullPointerException at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816) at com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426) at
[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed
[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7176: Description: I submit a job, but I need to kill it immediately due to some reason. Then I found the RM was killed. I check the RMLog and found ArrayIndexOutOfBoundsException and NullPointerException.According to the log,RM was killed due to NullPointerException, but i still don't understand why those Exception happen I attath the whole RM log. {code:java} 2017-09-08 02:34:37,967 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1504809243340_0001_01. Got exception: java.lang.ArrayIndexOutOfBoundsException: 3 at java.util.ArrayList.add(ArrayList.java:441) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2017-09-08 02:34:37,968 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating app: application_1504809243340_0001 java.lang.NullPointerException at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816) at com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426) at
[jira] [Updated] (YARN-7176) After kill command is send, the ResourceManager was killed
[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7176: Summary: After kill command is send, the ResourceManager was killed (was: After kill command is send, the job hangs ) > After kill command is send, the ResourceManager was killed > --- > > Key: YARN-7176 > URL: https://issues.apache.org/jira/browse/YARN-7176 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 2.6.0 >Reporter: lujie >Priority: Critical > Attachments: logs.rar > > > I submit a job, but i need to kill it immediately due to some reason. Then I > found the RM killed, > I check the log and found ArrayIndexOutOfBoundsException and > NullPointerException in RMLog: > {code:java} > 2017-09-08 02:34:37,967 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error > launching appattempt_1504809243340_0001_01. Got exception: > java.lang.ArrayIndexOutOfBoundsException: 3 > at java.util.ArrayList.add(ArrayList.java:441) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) > at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2017-09-08 02:34:37,968 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating app: application_1504809243340_0001 > java.lang.NullPointerException > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at >
[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed
[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7176: Summary: After job kill command is send, the ResourceManager was killed (was: After kill command is send, the ResourceManager was killed ) > After job kill command is send, the ResourceManager was killed > --- > > Key: YARN-7176 > URL: https://issues.apache.org/jira/browse/YARN-7176 > Project: Hadoop YARN > Issue Type: Bug > Components: RM >Affects Versions: 2.6.0 >Reporter: lujie >Priority: Critical > Attachments: logs.rar > > > I submit a job, but i need to kill it immediately due to some reason. Then I > found the RM killed, > I check the log and found ArrayIndexOutOfBoundsException and > NullPointerException in RMLog: > {code:java} > 2017-09-08 02:34:37,967 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error > launching appattempt_1504809243340_0001_01. Got exception: > java.lang.ArrayIndexOutOfBoundsException: 3 > at java.util.ArrayList.add(ArrayList.java:441) > at > com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) > at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) > at > org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) > at > org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) > at > org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2017-09-08 02:34:37,968 ERROR > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error > updating app: application_1504809243340_0001 > java.lang.NullPointerException > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at >
[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed
[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7176: Description: I submit a job, but i need to kill it immediately due to some reason. Then I found the RM was killed, I check the RMLog found ArrayIndexOutOfBoundsException and NullPointerException : {code:java} 2017-09-08 02:34:37,967 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1504809243340_0001_01. Got exception: java.lang.ArrayIndexOutOfBoundsException: 3 at java.util.ArrayList.add(ArrayList.java:441) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2017-09-08 02:34:37,968 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating app: application_1504809243340_0001 java.lang.NullPointerException at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816) at com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:163) at
[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed
[ https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7176: Description: I submit a job, but i need to kill it immediately due to some reason. Then I found the RM was killed, I check the log and found ArrayIndexOutOfBoundsException and NullPointerException in RMLog: {code:java} 2017-09-08 02:34:37,967 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error launching appattempt_1504809243340_0001_01. Got exception: java.lang.ArrayIndexOutOfBoundsException: 3 at java.util.ArrayList.add(ArrayList.java:441) at com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128) at org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72) at org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2017-09-08 02:34:37,968 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error updating app: application_1504809243340_0001 java.lang.NullPointerException at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481) at com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) at com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) at org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816) at com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62) at org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:163) at
[jira] [Comment Edited] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266684#comment-16266684 ] lujie edited comment on YARN-6948 at 11/27/17 12:02 PM: Does the test failure is related to this patch. was (Author: xiaoheipangzi): I don't think the test failure is related to this patch. > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0, 3.0.0-alpha4 >Reporter: lujie > Attachments: yarn-6948.png, yarn-6948.txt > > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266684#comment-16266684 ] lujie commented on YARN-6948: - I don't think the test failure is related to this patch. > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0, 3.0.0-alpha4 >Reporter: lujie > Attachments: yarn-6948.png, yarn-6948.txt > > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Description: I send kill command to application, nodemanager log shows: {code:java} 2017-11-25 19:18:48,126 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: couldn't find container container_1511608703018_0001_01_01 while processing FINISH_CONTAINERS event 2017-11-25 19:18:48,146 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: FINISH_APPLICATION at NEW at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) at java.lang.Thread.run(Thread.java:745) 2017-11-25 19:18:48,151 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1511608703018_0001 transitioned from NEW to INITING {code} was: I send kill command to application, nodemanager log shows: {code:java} 2017-11-25 19:18:48,126 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: couldn't find container container_1511608703018_0001_01_01 while processing FINISH_CONTAINERS event 2017-11-25 19:18:48,146 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: FINISH_APPLICATION at NEW at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) at java.lang.Thread.run(Thread.java:745) 2017-11-25 19:18:48,151 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1511608703018_0001 transitioned from NEW to INITING {code} > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at >
[jira] [Created] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
lujie created YARN-7563: --- Summary: Invalid event: FINISH_APPLICATION at NEW Key: YARN-7563 URL: https://issues.apache.org/jira/browse/YARN-7563 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 3.0.0-beta1 Reporter: lujie I send kill command to application, nodemanager log shows: {code:java} 2017-11-25 19:18:48,126 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: couldn't find container container_1511608703018_0001_01_01 while processing FINISH_CONTAINERS event 2017-11-25 19:18:48,146 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: FINISH_APPLICATION at NEW at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) at org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) at java.lang.Thread.run(Thread.java:745) 2017-11-25 19:18:48,151 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1511608703018_0001 transitioned from NEW to INITING {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Attachment: (was: YARN-7563.png) > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7536.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Attachment: (was: YARN-7536.png) > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987 ] lujie edited comment on YARN-7563 at 11/27/17 4:06 PM: --- I have find the reason by analysis code and logs !YARN-7536.png! above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl .startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. was (Author: xiaoheipangzi): I have find the reason by analysis code and logs [^YARN-7536.png] above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl .startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Attachment: YARN-7536.png > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7536.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987 ] lujie edited comment on YARN-7563 at 11/27/17 4:07 PM: --- I have find the reason by analysis code and logs !YARN-7563.png! above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl .startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. was (Author: xiaoheipangzi): I have find the reason by analysis code and logs !YARN-7536.png! above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl .startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Attachment: YARN-7563.png > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987 ] lujie edited comment on YARN-7563 at 11/27/17 4:10 PM: --- I have find the reason by analysis code and logs [^YARN-7563.png] above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl.startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it does, then (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. was (Author: xiaoheipangzi): I have find the reason by analysis code and logs [^YARN-7563.png] above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl.startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Attachment: YARN-7536.png > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7536.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987 ] lujie edited comment on YARN-7563 at 11/27/17 4:05 PM: --- I have find the reason by analysis code and logs [^YARN-7536.png] above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl .startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. was (Author: xiaoheipangzi): I have find the reason by analysis code and logs above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl .startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7536.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266994#comment-16266994 ] lujie commented on YARN-7563: - !YARN-7563.png! > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987 ] lujie edited comment on YARN-7563 at 11/27/17 4:08 PM: --- I have find the reason by analysis code and logs [^YARN-7563.png] above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl .startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. was (Author: xiaoheipangzi): I have find the reason by analysis code and logs !YARN-7563.png! above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl .startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Attachment: screenshot-1.png > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Attachment: (was: screenshot-1.png) > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Attachment: YARN-7563.png > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Comment: was deleted (was: !YARN-7563.png!) > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987 ] lujie edited comment on YARN-7563 at 11/27/17 4:09 PM: --- I have find the reason by analysis code and logs [^YARN-7563.png] above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl.startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. was (Author: xiaoheipangzi): I have find the reason by analysis code and logs [^YARN-7563.png] above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl .startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987 ] lujie edited comment on YARN-7563 at 11/27/17 4:04 PM: --- I have find the reason by analysis code and logs above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl .startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. was (Author: xiaoheipangzi): I have find the reason by analysis code and logs above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl .startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987 ] lujie commented on YARN-7563: - I have find the reason by analysis code and logs !YARN-7536.png! above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl .startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7536.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987 ] lujie edited comment on YARN-7563 at 11/27/17 4:03 PM: --- I have find the reason by analysis code and logs above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl .startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. was (Author: xiaoheipangzi): I have find the reason by analysis code and logs !YARN-7536.png! above figure has shown the reason:client submit a application and then send kill command. NM will start Container by ContainerManagerImpl .startContainerInternal, this method will (1)put appID in context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS event to ContainerManagerImpl. ContainerManagerImpl will first (2)check the appID if exists in context, if it dose, (3) send FINISH_APPLICATION. This bug manifests needing two condition: (1) happens before(2) and (3) happens before(4). one of them is violated, this bug will be hidden. I need to future check the ApplicationImpl code, make sure whether AppFinishTriggeredTransition needed to fix this bug. > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Attachment: (was: YARN-7536.png) > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0-beta1 >Reporter: lujie > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-6948: Attachment: yarn-6948.png > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0 >Reporter: lujie > Attachments: yarn-6948.png > > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266480#comment-16266480 ] lujie commented on YARN-6948: - Hi: Recently I restudy this bug, and find the bug reason !yarn-6948.png! When the applicationAttempt performs AttemptstartedTransition, it will send AppAttemptAddedSchedulerEvent to CapacityScheduler and transform to SUBMITTED, then the CapacityScheduler will send ATTEMPT_ADDED back to applicationAttempt, But if client send kill command to applicationAttempt, applicationAttempt will transform to FINAL_SAVING , and if ATTEMPT_ADDED arrives before applicationAttempt chang its state from FINAL_SAVING to KILLED, applicationAttempt will throw InvalidStateTransitonException exception. it will be ok if ATTEMPT_ADDED arrives at KILLED or SUBMITTED > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0 >Reporter: lujie > Attachments: yarn-6948.png > > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-6948: Attachment: yarn-6948.txt > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0 >Reporter: lujie > Attachments: yarn-6948.png, yarn-6948.txt > > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266480#comment-16266480 ] lujie edited comment on YARN-6948 at 11/27/17 8:11 AM: --- Hi: Recently I restudy this bug, and find the bug reason !yarn-6948.png! When the applicationAttempt performs AttemptstartedTransition, it will send AppAttemptAddedSchedulerEvent to CapacityScheduler and transform to SUBMITTED, then the CapacityScheduler will send ATTEMPT_ADDED back to applicationAttempt, But if client send kill command to applicationAttempt, applicationAttempt will transform to FINAL_SAVING , and if ATTEMPT_ADDED arrives before applicationAttempt chang its state from FINAL_SAVING to KILLED, applicationAttempt will throw InvalidStateTransitonException exception. it will be ok if ATTEMPT_ADDED arrives at KILLED(ignore the event) or SUBMITTED was (Author: xiaoheipangzi): Hi: Recently I restudy this bug, and find the bug reason !yarn-6948.png! When the applicationAttempt performs AttemptstartedTransition, it will send AppAttemptAddedSchedulerEvent to CapacityScheduler and transform to SUBMITTED, then the CapacityScheduler will send ATTEMPT_ADDED back to applicationAttempt, But if client send kill command to applicationAttempt, applicationAttempt will transform to FINAL_SAVING , and if ATTEMPT_ADDED arrives before applicationAttempt chang its state from FINAL_SAVING to KILLED, applicationAttempt will throw InvalidStateTransitonException exception. it will be ok if ATTEMPT_ADDED arrives at KILLED or SUBMITTED > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0 >Reporter: lujie > Attachments: yarn-6948.png > > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266487#comment-16266487 ] lujie edited comment on YARN-6948 at 11/27/17 8:17 AM: --- I have download the hadoop source code from github, the version is 3.1.0-SNAPSHOT, and Creating a patch [^yarn-6948.txt] was (Author: xiaoheipangzi): [^yarn-6948.txt] > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0, 3.0.0-alpha4 >Reporter: lujie > Attachments: yarn-6948.png, yarn-6948.txt > > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW may
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Summary: Invalid event: FINISH_APPLICATION at NEW may (was: Invalid event: FINISH_APPLICATION at NEW) > Invalid event: FINISH_APPLICATION at NEW may > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0, 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW may make some application level resource be not cleaned
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Summary: Invalid event: FINISH_APPLICATION at NEW may make some application level resource be not cleaned (was: Invalid event: FINISH_APPLICATION at NEW may) > Invalid event: FINISH_APPLICATION at NEW may make some application level > resource be not cleaned > - > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0, 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Affects Version/s: 2.6.0 > Invalid event: FINISH_APPLICATION at NEW > > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0, 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW may make some application level resource be not cleaned
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Attachment: YARN-7563.txt > Invalid event: FINISH_APPLICATION at NEW may make some application level > resource be not cleaned > - > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0, 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png, YARN-7563.txt > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW may make some application level resource be not cleaned
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Attachment: (was: YARN-7563.txt) > Invalid event: FINISH_APPLICATION at NEW may make some application level > resource be not cleaned > - > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0, 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW may make some application level resource be not cleaned
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268347#comment-16268347 ] lujie commented on YARN-7563: - I just attach a patch that contains a unit test to show this bugs. I also try to fix it based on existing code, but i am not sure whether my solution is good. please check it and let me now how to fix it better. > Invalid event: FINISH_APPLICATION at NEW may make some application level > resource be not cleaned > - > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0, 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png, YARN-7563.txt > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW may make some application level resource be not cleaned
[ https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7563: Attachment: YARN-7563.txt > Invalid event: FINISH_APPLICATION at NEW may make some application level > resource be not cleaned > - > > Key: YARN-7563 > URL: https://issues.apache.org/jira/browse/YARN-7563 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0, 3.0.0-beta1 >Reporter: lujie > Attachments: YARN-7563.png, YARN-7563.txt > > > I send kill command to application, nodemanager log shows: > {code:java} > 2017-11-25 19:18:48,126 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > couldn't find container container_1511608703018_0001_01_01 while > processing FINISH_CONTAINERS event > 2017-11-25 19:18:48,146 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > FINISH_APPLICATION at NEW > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) > at java.lang.Thread.run(Thread.java:745) > 2017-11-25 19:18:48,151 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1511608703018_0001 transitioned from NEW to INITING > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6950) Invalid event: LAUNCH_FAILED at FAILED
[ https://issues.apache.org/jira/browse/YARN-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293626#comment-16293626 ] lujie commented on YARN-6950: - Hi, i found this bug duplicates with yarn-933 > Invalid event: LAUNCH_FAILED at FAILED > -- > > Key: YARN-6950 > URL: https://issues.apache.org/jira/browse/YARN-6950 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0 >Reporter: lujie > Fix For: 2.7.0 > > > A RMAppAttemptImpl fail due to some reason,meanwhile AM fails to launch a > container and send event LAUNCH_FAILED,and the StateMachine can not handle > it: > {code:java} > 2017-07-05 03:33:09,013 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > LAUNCH_FAILED at FAILED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6950) Invalid event: LAUNCH_FAILED at FAILED
[ https://issues.apache.org/jira/browse/YARN-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-6950: Fix Version/s: 2.7.0 > Invalid event: LAUNCH_FAILED at FAILED > -- > > Key: YARN-6950 > URL: https://issues.apache.org/jira/browse/YARN-6950 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0 >Reporter: lujie > Fix For: 2.7.0 > > > A RMAppAttemptImpl fail due to some reason,meanwhile AM fails to launch a > container and send event LAUNCH_FAILED,and the StateMachine can not handle > it: > {code:java} > 2017-07-05 03:33:09,013 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > LAUNCH_FAILED at FAILED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-6950) Invalid event: LAUNCH_FAILED at FAILED
[ https://issues.apache.org/jira/browse/YARN-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie resolved YARN-6950. - Resolution: Duplicate > Invalid event: LAUNCH_FAILED at FAILED > -- > > Key: YARN-6950 > URL: https://issues.apache.org/jira/browse/YARN-6950 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.6.0 >Reporter: lujie > Fix For: 2.7.0 > > > A RMAppAttemptImpl fail due to some reason,meanwhile AM fails to launch a > container and send event LAUNCH_FAILED,and the StateMachine can not handle > it: > {code:java} > 2017-07-05 03:33:09,013 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > LAUNCH_FAILED at FAILED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-6948: Priority: Major (was: Minor) > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0, 3.0.0-alpha4 >Reporter: lujie >Assignee: lujie >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4 > > Attachments: YARN-6948_1.patch, YARN-6948_2.patch, yarn-6948.png, > yarn-6948.txt > > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7663: Priority: Major (was: Minor) > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Major > Labels: patch > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4 > > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, > YARN-7663_4.patch, YARN-7663_5.patch, YARN-7663_6.patch, YARN-7663_7.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7786) NullPointerException while launching ApplicationMaster
[ https://issues.apache.org/jira/browse/YARN-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7786: Priority: Major (was: Minor) > NullPointerException while launching ApplicationMaster > -- > > Key: YARN-7786 > URL: https://issues.apache.org/jira/browse/YARN-7786 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: lujie >Assignee: lujie >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: YARN-7786.patch, YARN-7786_1.patch, YARN-7786_2.patch, > YARN-7786_3.patch, YARN-7786_4.patch, YARN-7786_5.patch, YARN-7786_6.patch, > resourcemanager.log > > > Before launching the ApplicationMaster, send kill command to the job, then > some Null pointer appears: > {code} > 2017-11-25 21:27:25,333 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error > launching appattempt_1511616410268_0001_01. Got exception: > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:205) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:304) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Description: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still  strongly recommend adding error log messages for unhealthy nodemanger(especially startup).*{color} was: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still  strongly recommend adding error log messages for unhealthy nodemanger(especially for startup).*{color} > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. > After reading log messages for long time, I waked up to check the node > health . The Yarn web UI showed that the nodemanager is unhealthy, due to > "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. > {color:#d04437}*But I still  strongly recommend adding error log messages for > unhealthy nodemanger(especially startup).*{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Description: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still  strongly recommend adding error log messages for unhealthy nodemanger(especially for startup).*{color} was: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still  strongly recommend adding error log messages for unhealthy nodemanger.*{color} > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. > After reading log messages for long time, I waked up to check the node > health . The Yarn web UI showed that the nodemanager is unhealthy, due to > "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. > {color:#d04437}*But I still  strongly recommend adding error log messages for > unhealthy nodemanger(especially for startup).*{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8381) Job get stuck while node is unhealthy, but without log messages to indicate such case
lujie created YARN-8381: --- Summary: Job get stuck while node is unhealthy, but without log messages to indicate such case Key: YARN-8381 URL: https://issues.apache.org/jira/browse/YARN-8381 Project: Hadoop YARN Issue Type: Improvement Reporter: lujie I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. Then I waked up to check the node health after reading log message for long time. The Yarn web UI showed that the nodemanager is unhealthy, due to the "l{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. But I still  strongly recommend adding error log messages for unhealthy nodemanger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node is unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Summary: Job got stuck while node is unhealthy, but without log messages to indicate such case (was: Job get stuck while node is unhealthy, but without log messages to indicate such case) > Job got stuck while node is unhealthy, but without log messages to indicate > such case > - > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. Then I waked up to check the node health after > reading log message for long time. The Yarn web UI showed that the > nodemanager is unhealthy, due to the "l{{ocal-dirs are bad: > /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. But I still  strongly recommend adding error > log messages for unhealthy nodemanger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Description: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to the "l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. But I still  strongly recommend adding error log messages for unhealthy nodemanger. was:I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. Then I waked up to check the node health after reading log message for long time. The Yarn web UI showed that the nodemanager is unhealthy, due to the "l{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. But I still  strongly recommend adding error log messages for unhealthy nodemanger. > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. > After reading log messages for long time, I waked up to check the node > health . The Yarn web UI showed that the nodemanager is unhealthy, due to the > "l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure > the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. But I still  strongly recommend adding error > log messages for unhealthy nodemanger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Summary: Job got stuck while node was unhealthy, but without log messages to indicate such case (was: Job got stuck while node is unhealthy, but without log messages to indicate such case) > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. Then I waked up to check the node health after > reading log message for long time. The Yarn web UI showed that the > nodemanager is unhealthy, due to the "l{{ocal-dirs are bad: > /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. But I still  strongly recommend adding error > log messages for unhealthy nodemanger. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Description: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still  strongly recommend adding error log messages for unhealthy nodemanger.*{color} was: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to the "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still  strongly recommend adding error log messages for unhealthy nodemanger.*{color} > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. > After reading log messages for long time, I waked up to check the node > health . The Yarn web UI showed that the nodemanager is unhealthy, due to > "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. > {color:#d04437}*But I still  strongly recommend adding error log messages for > unhealthy nodemanger.*{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Description: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to the "l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still  strongly recommend adding error log messages for unhealthy nodemanger.*{color} was: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to the "l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. But I still  strongly recommend adding error log messages for unhealthy nodemanger. > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. > After reading log messages for long time, I waked up to check the node > health . The Yarn web UI showed that the nodemanager is unhealthy, due to the > "l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure > the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. > {color:#d04437}*But I still  strongly recommend adding error log messages for > unhealthy nodemanger.*{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case
[ https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-8381: Description: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to the "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still  strongly recommend adding error log messages for unhealthy nodemanger.*{color} was: I started a fresh pseudo-distributed system on an node, then run a job but it stuck. My first reaction was checking log message to local problem, but obtaining no error message. After reading log messages for long time, I waked up to check the node health . The Yarn web UI showed that the nodemanager is unhealthy, due to the "l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}". I reconfigure the "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" to 98% and solved this problem. {color:#d04437}*But I still  strongly recommend adding error log messages for unhealthy nodemanger.*{color} > Job got stuck while node was unhealthy, but without log messages to indicate > such case > -- > > Key: YARN-8381 > URL: https://issues.apache.org/jira/browse/YARN-8381 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: lujie >Priority: Major > > I started a fresh pseudo-distributed system on an node, then run a job but > it stuck. My first reaction was checking log message to local problem, but > obtaining no error message. > After reading log messages for long time, I waked up to check the node > health . The Yarn web UI showed that the nodemanager is unhealthy, due to the > "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir". I reconfigure the > "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}" > to 98% and solved this problem. > {color:#d04437}*But I still  strongly recommend adding error log messages for > unhealthy nodemanger.*{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7663: Attachment: YARN-7663_5.patch Hi: {code:java} Rather than calling createNewTestApp then throwing away the results, it would be cleaner to extend createNewTestApp to take a boolean parameter specifying whether to create an app with invalid state transition detection or without. Alternatively you could factor out the rmContext, scheduler, and conf setup from createNewTestApp so the test can leverage it without needing to do all the other, unrelated stuff in createNewTestApp. {code} After I implement both of the two plans, I perform the second plan because it will add less code and more cleaner. In the new patch , I factor out the unrelated arguments that passed(set them to null) to constructed function of RMAppImpl as more as possible. > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Minor > Labels: patch > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, > YARN-7663_4.patch, YARN-7663_5.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314450#comment-16314450 ] lujie edited comment on YARN-7663 at 1/6/18 8:25 AM: - different from YARN-7663_5.patch 1. Replace fooTestAppNewKill with testAppStartAfterKilled at line 61 2. fix checkstyle error was (Author: xiaoheipangzi): different from YARN-7663_5.patch 1. Replace fooTestAppNewKill with testAppStartAfterKilled 2. fix checkstyle error > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Minor > Labels: patch > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, > YARN-7663_4.patch, YARN-7663_5.patch, YARN-7663_6.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314450#comment-16314450 ] lujie edited comment on YARN-7663 at 1/6/18 8:25 AM: - different from YARN-7663_5.patch 1. Replace fooTestAppNewKill with testAppStartAfterKilled 2. fix checkstyle error was (Author: xiaoheipangzi): different from YARN-7663_6.patch 1. Replace fooTestAppNewKill with testAppStartAfterKilled 2. fix checkstyle error > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Minor > Labels: patch > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, > YARN-7663_4.patch, YARN-7663_5.patch, YARN-7663_6.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7663: Attachment: YARN-7663_6.patch different from YARN-7663_6.patch 1. Replace fooTestAppNewKill with testAppStartAfterKilled 2. fix checkstyle error > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Minor > Labels: patch > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, > YARN-7663_4.patch, YARN-7663_5.patch, YARN-7663_6.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-6948: Attachment: YARN-6948_2.patch > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0, 3.0.0-alpha4 >Reporter: lujie > Attachments: YARN-6948_1.patch, YARN-6948_2.patch, yarn-6948.png, > yarn-6948.txt > > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317592#comment-16317592 ] lujie edited comment on YARN-6948 at 1/9/18 6:06 AM: - After discuss [YARN-7663|https://issues.apache.org/jira/browse/YARN-7663] with [#Jason Lowe], I think this bug can have same unit test strategy. [^YARN-6948_1.patch] is not clean and has checkstyle errors, I reattach the [^YARN-6948_2.patch] was (Author: xiaoheipangzi): After discuss [YARN-7663|https://issues.apache.org/jira/browse/YARN-7663] with [#Jason Lowe], I think this bug can have same unit test strategy. YARN-6948_1.patch is not clean and has checkstyle errors, I reattach the YARN-6948_2 > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0, 3.0.0-alpha4 >Reporter: lujie > Attachments: YARN-6948_1.patch, YARN-6948_2.patch, yarn-6948.png, > yarn-6948.txt > > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317592#comment-16317592 ] lujie edited comment on YARN-6948 at 1/9/18 6:04 AM: - After discuss [YARN-7663|https://issues.apache.org/jira/browse/YARN-7663] with [#Jason Lowe], I think this bug can have same unit test strategy. was (Author: xiaoheipangzi): After discuss [YARN-7663|https://issues.apache.org/jira/browse/YARN-7663] with [#Jason Lowe], I think this bug can have same unit test strategy. The only difference is that I override the onInvalidTranstion in a independent class RMAppAttemptImplForTest. And there exists two checksyte errors in my locally running, but i have no idea to fix them, any suggestion? > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0, 3.0.0-alpha4 >Reporter: lujie > Attachments: YARN-6948_1.patch, YARN-6948_2.patch, yarn-6948.png, > yarn-6948.txt > > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317592#comment-16317592 ] lujie edited comment on YARN-6948 at 1/9/18 6:05 AM: - After discuss [YARN-7663|https://issues.apache.org/jira/browse/YARN-7663] with [#Jason Lowe], I think this bug can have same unit test strategy. YARN-6948_1.patch is not clean and has checkstyle errors, I reattach the YARN-6948_2 was (Author: xiaoheipangzi): After discuss [YARN-7663|https://issues.apache.org/jira/browse/YARN-7663] with [#Jason Lowe], I think this bug can have same unit test strategy. > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0, 3.0.0-alpha4 >Reporter: lujie > Attachments: YARN-6948_1.patch, YARN-6948_2.patch, yarn-6948.png, > yarn-6948.txt > > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-6948: Attachment: YARN-6948_1.patch After discuss [YARN-7663|https://issues.apache.org/jira/browse/YARN-7663] with [#Jason Lowe], I think this bug can have same unit test strategy. The only difference is that I override the onInvalidTranstion in a independent class RMAppAttemptImplForTest. > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0, 3.0.0-alpha4 >Reporter: lujie > Attachments: YARN-6948_1.patch, yarn-6948.png, yarn-6948.txt > > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7663: Attachment: YARN-7663_4.patch Hi: I have moved the method that performs assert to new test just as [#Jason Lowe] suggest. But I still feel uncertain about the TODO that exists in RMAppImpl handle foo when I add onInvalidStateTransition. Below is the code: {code:java} try { /* keep the master in sync with the state machine */ this.stateMachine.doTransition(event.getType(), event); } catch (InvalidStateTransitionException e) { LOG.error("App: " + appID + " can't handle this event at current state", e); onInvalidStateTransition(event.getType(), oldState); {color:red}/* TODO fail the application on the failed transition */{color} } {code} The TODO already exists in system for a long time, if this TODO is meaningless, it should be deleted. If it is really needed to implement, I think the implementation can be placed in new added foo(onInvalidStateTransition). > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Minor > Labels: patch > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, > YARN-7663_4.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7703) Apps killed from the NEW state are not recorded in the state store
[ https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312814#comment-16312814 ] lujie edited comment on YARN-7703 at 1/5/18 10:09 AM: -- I have a initial fix idea which need to be review: While application receive KILL event at NEW state, current code use AppKilledTransition which ignores storing state. We can use {code:java} new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED) {code} to replace AppKilledTransition and the postState should be changed to FINAL_SAVING. FinalSavingTransition will tell StateStore to perform store action. The stateStore will reply APP_UPDATE_SAVED back to application. In unit test TestRMAppTransitions#testAppNewKill, we only need add a line {color:#d04437}assertAppState(RMAppState.FINAL_SAVING, application);{color} just before perform sendAppUpdateSavedEvent i would attach a patch after YARN-7663 fixed, and this patch should fix another InvalidStateTransitionException(only mark it here). was (Author: xiaoheipangzi): I have a initial fix idea which need to be review: While application receive KILL event at NEW state, current code use AppKilledTransition which ignores storing state. We can use {code:java} new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED) {code} to replace AppKilledTransition and the postState should be changed to FINAL_SAVING. FinalSavingTransition will tell StateStore to perform store action. The stateStore will reply APP_UPDATE_SAVED back to application. In unit test TestRMAppTransitions#testAppNewKill, we only need add a line {color:#d04437}assertAppState(RMAppState.FINAL_SAVING, application);{color} before perform sendAppUpdateSavedEvent i would attach a patch after YARN-7663 fixed, and this patch should fix another InvalidStateTransitionException(only mark it here). > Apps killed from the NEW state are not recorded in the state store > -- > > Key: YARN-7703 > URL: https://issues.apache.org/jira/browse/YARN-7703 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jason Lowe >Assignee: lujie > > While reviewing YARN-7663 I noticed that apps killed from the NEW state skip > storing anything to the RM state store. That means upon restart and recovery > these apps will not be recovered, so they will simply disappear. That could > be surprising for users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7703) Apps killed from the NEW state are not recorded in the state store
[ https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312814#comment-16312814 ] lujie edited comment on YARN-7703 at 1/5/18 10:08 AM: -- I have a initial fix idea which need to be review: While application receive KILL event at NEW state, current code use AppKilledTransition which ignores storing state. We can use {code:java} new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED) {code} to replace AppKilledTransition and the postState should be changed to FINAL_SAVING. FinalSavingTransition will tell StateStore to perform store action. The stateStore will reply APP_UPDATE_SAVED back to application. In unit test TestRMAppTransitions#testAppNewKill, we only need add a line {color:#d04437}assertAppState(RMAppState.FINAL_SAVING, application);{color} before perform sendAppUpdateSavedEvent i would attach a patch after YARN-7663 fixed, and this patch should fix another InvalidStateTransitionException(only mark it here). was (Author: xiaoheipangzi): I have a initial fix idea which need to be review: While application receive KILL event at NEW state, current code use AppKilledTransition which ignores storing state. We can use {code:java} new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED) {code} to replace AppKilledTransition and the postState should be changed to FINAL_SAVING. FinalSavingTransition will tell StateStore to perform store action. The stateStore will reply APP_UPDATE_SAVED back to application. In unit test TestRMAppTransitions#testAppNewKill, we only need add a line :assertAppState(RMAppState.FINAL_SAVING, application); before sendAppUpdateSavedEvent i would attach a patch after YARN-7663 fixed, and this patch should fix another InvalidStateTransitionException(only mark it here). > Apps killed from the NEW state are not recorded in the state store > -- > > Key: YARN-7703 > URL: https://issues.apache.org/jira/browse/YARN-7703 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jason Lowe >Assignee: lujie > > While reviewing YARN-7663 I noticed that apps killed from the NEW state skip > storing anything to the RM state store. That means upon restart and recovery > these apps will not be recovered, so they will simply disappear. That could > be surprising for users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7703) Apps killed from the NEW state are not recorded in the state store
[ https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312814#comment-16312814 ] lujie edited comment on YARN-7703 at 1/5/18 10:08 AM: -- I have a initial fix idea which need to be review: While application receive KILL event at NEW state, current code use AppKilledTransition which ignores storing state. We can use {code:java} new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED) {code} to replace AppKilledTransition and the postState should be changed to FINAL_SAVING. FinalSavingTransition will tell StateStore to perform store action. The stateStore will reply APP_UPDATE_SAVED back to application. In unit test TestRMAppTransitions#testAppNewKill, we only need add a line :assertAppState(RMAppState.FINAL_SAVING, application); before sendAppUpdateSavedEvent i would attach a patch after YARN-7663 fixed, and this patch should fix another InvalidStateTransitionException(only mark it here). was (Author: xiaoheipangzi): I have a initial fix idea which need to be review: While application receive KILL event at NEW state, current code use AppKilledTransition which ignores storing state. We can use {code:java} new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED) {code} to replace AppKilledTransition and the postState should be changed to FINAL_SAVING. FinalSavingTransition will tell StateStore to perform store action. The stateStore will reply APP_UPDATE_SAVED back to application. In unit test TestRMAppTransitions#testAppNewKill, we only need add a line :assertAppState(RMAppState.FINAL_SAVING, application); i would attach a patch after YARN-7663 fixed, and this patch should fix another InvalidStateTransitionException(only mark it here). > Apps killed from the NEW state are not recorded in the state store > -- > > Key: YARN-7703 > URL: https://issues.apache.org/jira/browse/YARN-7703 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jason Lowe >Assignee: lujie > > While reviewing YARN-7663 I noticed that apps killed from the NEW state skip > storing anything to the RM state store. That means upon restart and recovery > these apps will not be recovered, so they will simply disappear. That could > be surprising for users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312700#comment-16312700 ] lujie edited comment on YARN-7663 at 1/5/18 8:28 AM: - Hi: I have moved the method that performs assert to new test just as [#Jason Lowe] suggest. But I still feel uncertain about the TODO that exists in RMAppImpl handle foo when I add onInvalidStateTransition. Below is the code: {code:java} try { /* keep the master in sync with the state machine */ this.stateMachine.doTransition(event.getType(), event); } catch (InvalidStateTransitionException e) { LOG.error("App: " + appID + " can't handle this event at current state", e); onInvalidStateTransition(event.getType(), oldState); TODO fail the application on the failed transition } {code} The TODO already exists in system for a long time, if this TODO is meaningless, it should be deleted. If it is really needed to implement, I think the implementation can be placed in new added foo(onInvalidStateTransition). was (Author: xiaoheipangzi): Hi: I have moved the method that performs assert to new test just as [#Jason Lowe] suggest. But I still feel uncertain about the TODO that exists in RMAppImpl handle foo when I add onInvalidStateTransition. Below is the code: {code:java} try { /* keep the master in sync with the state machine */ this.stateMachine.doTransition(event.getType(), event); } catch (InvalidStateTransitionException e) { LOG.error("App: " + appID + " can't handle this event at current state", e); onInvalidStateTransition(event.getType(), oldState); {color:red}/* TODO fail the application on the failed transition */{color} } {code} The TODO already exists in system for a long time, if this TODO is meaningless, it should be deleted. If it is really needed to implement, I think the implementation can be placed in new added foo(onInvalidStateTransition). > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Minor > Labels: patch > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, > YARN-7663_4.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312700#comment-16312700 ] lujie edited comment on YARN-7663 at 1/5/18 8:28 AM: - Hi: I have moved the method that performs assert to new test just as [#Jason Lowe] suggest. But I still feel uncertain about the TODO that exists in RMAppImpl handle foo when I add onInvalidStateTransition. Below is the code: {code:java} try { /* keep the master in sync with the state machine */ this.stateMachine.doTransition(event.getType(), event); } catch (InvalidStateTransitionException e) { LOG.error("App: " + appID + " can't handle this event at current state", e); onInvalidStateTransition(event.getType(), oldState); /* TODO fail the application on the failed transition*/ } {code} The TODO already exists in system for a long time, if this TODO is meaningless, it should be deleted. If it is really needed to implement, I think the implementation can be placed in new added foo(onInvalidStateTransition). was (Author: xiaoheipangzi): Hi: I have moved the method that performs assert to new test just as [#Jason Lowe] suggest. But I still feel uncertain about the TODO that exists in RMAppImpl handle foo when I add onInvalidStateTransition. Below is the code: {code:java} try { /* keep the master in sync with the state machine */ this.stateMachine.doTransition(event.getType(), event); } catch (InvalidStateTransitionException e) { LOG.error("App: " + appID + " can't handle this event at current state", e); onInvalidStateTransition(event.getType(), oldState); TODO fail the application on the failed transition } {code} The TODO already exists in system for a long time, if this TODO is meaningless, it should be deleted. If it is really needed to implement, I think the implementation can be placed in new added foo(onInvalidStateTransition). > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Minor > Labels: patch > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, > YARN-7663_4.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7703) Apps killed from the NEW state are not recorded in the state store
[ https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie reassigned YARN-7703: --- Assignee: lujie > Apps killed from the NEW state are not recorded in the state store > -- > > Key: YARN-7703 > URL: https://issues.apache.org/jira/browse/YARN-7703 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jason Lowe >Assignee: lujie > > While reviewing YARN-7663 I noticed that apps killed from the NEW state skip > storing anything to the RM state store. That means upon restart and recovery > these apps will not be recovered, so they will simply disappear. That could > be surprising for users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7703) Apps killed from the NEW state are not recorded in the state store
[ https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie reassigned YARN-7703: --- Assignee: (was: lujie) > Apps killed from the NEW state are not recorded in the state store > -- > > Key: YARN-7703 > URL: https://issues.apache.org/jira/browse/YARN-7703 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jason Lowe > > While reviewing YARN-7663 I noticed that apps killed from the NEW state skip > storing anything to the RM state store. That means upon restart and recovery > these apps will not be recovered, so they will simply disappear. That could > be surprising for users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312700#comment-16312700 ] lujie edited comment on YARN-7663 at 1/5/18 8:49 AM: - Hi: I have moved the method that performs assert to new test just as [#Jason Lowe] suggest. But I still feel uncertain about the TODO that exists in RMAppImpl handle foo when I add onInvalidStateTransition. Below is the code: {code:java} try { /* keep the master in sync with the state machine */ this.stateMachine.doTransition(event.getType(), event); } catch (InvalidStateTransitionException e) { LOG.error("App: " + appID + " can't handle this event at current state", e); onInvalidStateTransition(event.getType(), oldState); /* TODO fail the application on the failed transition*/ } {code} The TODO already exists in system for a long long time, if this TODO is meaningless, it should be deleted. If it is really needed to implement, I think the implementation can be placed in new added foo(onInvalidStateTransition). was (Author: xiaoheipangzi): Hi: I have moved the method that performs assert to new test just as [#Jason Lowe] suggest. But I still feel uncertain about the TODO that exists in RMAppImpl handle foo when I add onInvalidStateTransition. Below is the code: {code:java} try { /* keep the master in sync with the state machine */ this.stateMachine.doTransition(event.getType(), event); } catch (InvalidStateTransitionException e) { LOG.error("App: " + appID + " can't handle this event at current state", e); onInvalidStateTransition(event.getType(), oldState); /* TODO fail the application on the failed transition*/ } {code} The TODO already exists in system for a long time, if this TODO is meaningless, it should be deleted. If it is really needed to implement, I think the implementation can be placed in new added foo(onInvalidStateTransition). > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Minor > Labels: patch > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, > YARN-7663_4.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-7703) Apps killed from the NEW state are not recorded in the state store
[ https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie reassigned YARN-7703: --- Assignee: lujie > Apps killed from the NEW state are not recorded in the state store > -- > > Key: YARN-7703 > URL: https://issues.apache.org/jira/browse/YARN-7703 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jason Lowe >Assignee: lujie > > While reviewing YARN-7663 I noticed that apps killed from the NEW state skip > storing anything to the RM state store. That means upon restart and recovery > these apps will not be recovered, so they will simply disappear. That could > be surprising for users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7703) Apps killed from the NEW state are not recorded in the state store
[ https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312814#comment-16312814 ] lujie commented on YARN-7703: - I have a initial fix idea which need to be review: While application receive KILL event at NEW state, current code use AppKilledTransition which ignores storing state. We can use {code:java} new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED) {code} to replace AppKilledTransition and the postState should be changed to FINAL_SAVING. FinalSavingTransition will tell StateStore to perform store action. The stateStore will reply APP_UPDATE_SAVED back to application. In unit test TestRMAppTransitions#testAppNewKill, we only need add a line :assertAppState(RMAppState.FINAL_SAVING, application); i would attach a patch after YARN-7663 fixed, and this patch should fix another InvalidStateTransitionException(only mark it here). > Apps killed from the NEW state are not recorded in the state store > -- > > Key: YARN-7703 > URL: https://issues.apache.org/jira/browse/YARN-7703 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jason Lowe >Assignee: lujie > > While reviewing YARN-7663 I noticed that apps killed from the NEW state skip > storing anything to the RM state store. That means upon restart and recovery > these apps will not be recovered, so they will simply disappear. That could > be surprising for users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7703) Apps killed from the NEW state are not recorded in the state store
[ https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312814#comment-16312814 ] lujie edited comment on YARN-7703 at 1/5/18 10:12 AM: -- I have a initial fix idea which need to be review: While application receive KILL event at NEW state, current code use AppKilledTransition which ignores storing state. We can use {code:java} new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED) {code} to replace AppKilledTransition and the postState should be changed to FINAL_SAVING. FinalSavingTransition will tell StateStore to perform store action. The stateStore will reply APP_UPDATE_SAVED back to applicatio and finally state is KILLED. In unit test TestRMAppTransitions#testAppNewKill, we only need add a line {color:#d04437}assertAppState(RMAppState.FINAL_SAVING, application);{color} just before perform sendAppUpdateSavedEvent i would attach a patch after YARN-7663 fixed, and this patch should fix another InvalidStateTransitionException(only mark it here). was (Author: xiaoheipangzi): I have a initial fix idea which need to be review: While application receive KILL event at NEW state, current code use AppKilledTransition which ignores storing state. We can use {code:java} new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED) {code} to replace AppKilledTransition and the postState should be changed to FINAL_SAVING. FinalSavingTransition will tell StateStore to perform store action. The stateStore will reply APP_UPDATE_SAVED back to application. In unit test TestRMAppTransitions#testAppNewKill, we only need add a line {color:#d04437}assertAppState(RMAppState.FINAL_SAVING, application);{color} just before perform sendAppUpdateSavedEvent i would attach a patch after YARN-7663 fixed, and this patch should fix another InvalidStateTransitionException(only mark it here). > Apps killed from the NEW state are not recorded in the state store > -- > > Key: YARN-7703 > URL: https://issues.apache.org/jira/browse/YARN-7703 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jason Lowe >Assignee: lujie > > While reviewing YARN-7663 I noticed that apps killed from the NEW state skip > storing anything to the RM state store. That means upon restart and recovery > these apps will not be recovered, so they will simply disappear. That could > be surprising for users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7703) Apps killed from the NEW state are not recorded in the state store
[ https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312814#comment-16312814 ] lujie edited comment on YARN-7703 at 1/5/18 10:12 AM: -- I have a initial fix idea which need to be review: While application receive KILL event at NEW state, current code use AppKilledTransition which ignores storing state. We can use {code:java} new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED) {code} to replace AppKilledTransition and the postState should be changed to FINAL_SAVING. FinalSavingTransition will tell StateStore to perform store action. The stateStore will reply APP_UPDATE_SAVED back to applicatio and finally state will change to KILLED. In unit test TestRMAppTransitions#testAppNewKill, we only need add a line {color:#d04437}assertAppState(RMAppState.FINAL_SAVING, application);{color} just before perform sendAppUpdateSavedEvent i would attach a patch after YARN-7663 fixed, and this patch should fix another InvalidStateTransitionException(only mark it here). was (Author: xiaoheipangzi): I have a initial fix idea which need to be review: While application receive KILL event at NEW state, current code use AppKilledTransition which ignores storing state. We can use {code:java} new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED) {code} to replace AppKilledTransition and the postState should be changed to FINAL_SAVING. FinalSavingTransition will tell StateStore to perform store action. The stateStore will reply APP_UPDATE_SAVED back to applicatio and finally state is KILLED. In unit test TestRMAppTransitions#testAppNewKill, we only need add a line {color:#d04437}assertAppState(RMAppState.FINAL_SAVING, application);{color} just before perform sendAppUpdateSavedEvent i would attach a patch after YARN-7663 fixed, and this patch should fix another InvalidStateTransitionException(only mark it here). > Apps killed from the NEW state are not recorded in the state store > -- > > Key: YARN-7703 > URL: https://issues.apache.org/jira/browse/YARN-7703 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Jason Lowe >Assignee: lujie > > While reviewing YARN-7663 I noticed that apps killed from the NEW state skip > storing anything to the RM state store. That means upon restart and recovery > these apps will not be recovered, so they will simply disappear. That could > be surprising for users. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310909#comment-16310909 ] lujie edited comment on YARN-7663 at 1/4/18 9:01 AM: - After reading Jason Lowe useful suggestion. I rewrite the unit test and attach the new patch . In this patch ,I do three three things: 1. add empty protected method:onInvalidStateTransition, and add its callsite in the code block that RMAppImpl handle InvalidStateTransition2.create a new final class RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, create RMAppImplForTest object instead of RMAppImpl. 3. fix this bug by ignore the event But there are another two InvalidStateTransition while testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED. These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in TestRMAppTransitions. is it should be better to defer that to another JIRA? Or can we just ignore the event without test, just as [YARN-4598|https://issues.apache.org/jira/browse/YARN-4598] was (Author: xiaoheipangzi): After reading Jason Lowe useful suggestion. I rewrite the unit test and attach the new patch . In this patch ,I do three three things: 1. add empty protected methodon:onInvalidStateTransition, and add its callsite in the code block that RMAppImpl handle InvalidStateTransition2.create a new final class RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, create RMAppImplForTest object instead of RMAppImpl. 3. fix this bug by ignore the event But there are another two InvalidStateTransition while testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED. These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in TestRMAppTransitions. is it should be better to defer that to another JIRA? Or can we just ignore the event without test, just as [YARN-4598|https://issues.apache.org/jira/browse/YARN-4598] > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Minor > Labels: patch > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310909#comment-16310909 ] lujie edited comment on YARN-7663 at 1/4/18 8:08 AM: - After reading Jason Lowe useful suggestion. I rewrite the unit test and attach the new patch . In this patch ,I do three three things: 1. add empty protected methodon:onInvalidStateTransition, and add its callsite in the code block that RMAppImpl handle InvalidStateTransition2.create a new final class RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, create RMAppImplForTest object instead of RMAppImpl. 3. fix this bug by ignore the event But there are another two InvalidStateTransition while testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED. These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in TestRMAppTransitions. is it should be better to defer that to another JIRA? Or can we just ignore the event without test, just as [YARN-4598|https://issues.apache.org/jira/browse/YARN-4598] was (Author: xiaoheipangzi): After reading Jason Lowe useful suggestion. I change the unit test and attach the new patch . In this patch ,I do three three things: 1. add empty protected methodon:onInvalidStateTransition, and add its callsite in the code block that RMAppImpl handle InvalidStateTransition2.create a new final class RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, create RMAppImplForTest object instead of RMAppImpl. 3. fix this bug by ignore the event But there are another two InvalidStateTransition while testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED. These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in TestRMAppTransitions. is it should be better to defer that to another JIRA? Or can we just ignore the event without test, just as [YARN-4598|https://issues.apache.org/jira/browse/YARN-4598] > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Minor > Labels: patch > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310909#comment-16310909 ] lujie edited comment on YARN-7663 at 1/4/18 7:39 AM: - After reading Jason Lowe useful suggestion. I change the unit test and attach the new patch . In this patch ,I do three three things: 1. add empty protected methodon:onInvalidStateTransition, and add its callsite in the code block that RMAppImpl handle InvalidStateTransition2.create a new final class RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, create RMAppImplForTest object instead of RMAppImpl. 3. fix this bug by ignore the event But there are another two InvalidStateTransition while testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED. These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in TestRMAppTransitions. is it should be better to defer that to another JIRA? Or can we just ignore the event without test, just as [YARN-4598|https://issues.apache.org/jira/browse/YARN-4598] was (Author: xiaoheipangzi): After reading Jason Lowe useful suggestion. I change the unit test and attach the new patch . In this patch ,I do three three things: 1. add empty protected methodon:onInvalidStateTransition, and add its callsite in the code block that RMAppImpl handle InvalidStateTransition2.create a new final class RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, create RMAppImplForTest object instead of RMAppImpl. 3. fix this bug by ignore the event But there are another two InvalidStateTransition while testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED. These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in TestRMAppTransitions. is it should be better to defer that to another JIRA? Or can we just ignore the event without test, just as [link YARN-4598|https://issues.apache.org/jira/browse/YARN-4598] > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Minor > Labels: patch > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310909#comment-16310909 ] lujie edited comment on YARN-7663 at 1/4/18 7:39 AM: - After reading Jason Lowe useful suggestion. I change the unit test and attach the new patch . In this patch ,I do three three things: 1. add empty protected methodon:onInvalidStateTransition, and add its callsite in the code block that RMAppImpl handle InvalidStateTransition2.create a new final class RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, create RMAppImplForTest object instead of RMAppImpl. 3. fix this bug by ignore the event But there are another two InvalidStateTransition while testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED. These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in TestRMAppTransitions. is it should be better to defer that to another JIRA? Or can we just ignore the event without test, just as [link YARN-4598|https://issues.apache.org/jira/browse/YARN-4598] was (Author: xiaoheipangzi): After reading Jason Lowe useful suggestion. I change the unit test and attach the new patch . In this patch ,I do three three things: 1. add empty protected methodon:onInvalidStateTransition, and add its callsite in the code block that RMAppImpl handle InvalidStateTransition2.create a new final class RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, create RMAppImplForTest object instead of RMAppImpl. 3. fix this bug by ignore the event But there are another two InvalidStateTransition while testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED. These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in TestRMAppTransitions. ishould be better to defer that to another JIRA? Or can we just ignore the event without test, just as [link YARN-4598|https://issues.apache.org/jira/browse/YARN-4598] > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Minor > Labels: patch > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7663: Attachment: YARN-7663_3.patch After reading Jason Lowe useful suggestion. I change the unit test and attach the new patch . In this patch ,I do three three things: 1. add empty protected methodon:onInvalidStateTransition, and add its callsite in the code block that RMAppImpl handle InvalidStateTransition2.create a new final class RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, create RMAppImplForTest object instead of RMAppImpl. 3. fix this bug by ignore the event But there are another two InvalidStateTransition while testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED. These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in TestRMAppTransitions. ishould be better to defer that to another JIRA? Or can we just ignore the event without test, just as [link YARN-4598|https://issues.apache.org/jira/browse/YARN-4598] > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Minor > Labels: patch > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310909#comment-16310909 ] lujie edited comment on YARN-7663 at 1/10/18 3:03 AM: -- After reading Jason Lowe useful suggestion. I rewrite the unit test and attach the new patch . In this patch ,I do three three things: 1. add empty protected method:onInvalidStateTransition, and add its callsite in the code block that RMAppImpl handle InvalidStateTransition2.create a new final class RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, create RMAppImplForTest object instead of RMAppImpl. 3. fix this bug by ignore the event But there are another two InvalidStateTransition while testing:1.testAppAcceptedFailed:APP_ACCEPTED at state ACCEPTED 2.testAppRunningFailed:,APP_UPDATE_SAVED at state KILLED. These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in TestRMAppTransitions. is it should be better to defer that to another JIRA? Or can we just ignore the event without test, just as [YARN-4598|https://issues.apache.org/jira/browse/YARN-4598] was (Author: xiaoheipangzi): After reading Jason Lowe useful suggestion. I rewrite the unit test and attach the new patch . In this patch ,I do three three things: 1. add empty protected method:onInvalidStateTransition, and add its callsite in the code block that RMAppImpl handle InvalidStateTransition2.create a new final class RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, create RMAppImplForTest object instead of RMAppImpl. 3. fix this bug by ignore the event But there are another two InvalidStateTransition while testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED. These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in TestRMAppTransitions. is it should be better to defer that to another JIRA? Or can we just ignore the event without test, just as [YARN-4598|https://issues.apache.org/jira/browse/YARN-4598] > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Minor > Labels: patch > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4 > > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, > YARN-7663_4.patch, YARN-7663_5.patch, YARN-7663_6.patch, YARN-7663_7.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-7726) RMAppImpl: can't handle APP_ACCEPTED at state ACCEPTED
lujie created YARN-7726: --- Summary: RMAppImpl: can't handle APP_ACCEPTED at state ACCEPTED Key: YARN-7726 URL: https://issues.apache.org/jira/browse/YARN-7726 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.3 Reporter: lujie Priority: Minor while adding patch to TestRMAppTransitions, the patch triggers error message: "can't handle APP_ACCEPTED at state ACCEPTED" in unit test testAppAcceptedFailed and testAppRunningFailed -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org