[jira] [Created] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2017-08-04 Thread lujie (JIRA)
lujie created YARN-6948:
---

 Summary: Invalid event: ATTEMPT_ADDED at FINAL_SAVING
 Key: YARN-6948
 URL: https://issues.apache.org/jira/browse/YARN-6948
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: lujie


When I send kill command to a running job, I check the logs and find the 
Exception:

{code:java}
2017-08-03 01:35:20,485 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
ATTEMPT_ADDED at FINAL_SAVING
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6950) Invalid event: LAUNCH_FAILED at FAILED

2017-08-04 Thread lujie (JIRA)
lujie created YARN-6950:
---

 Summary: Invalid event: LAUNCH_FAILED at FAILED
 Key: YARN-6950
 URL: https://issues.apache.org/jira/browse/YARN-6950
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.6.0
Reporter: lujie


A RMAppAttemptImpl fail due to some reason,meanwhile AM fails to launch a 
container and send event  LAUNCH_FAILED,and the StateMachine can not handle it:

{code:java}
2017-07-05 03:33:09,013 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
LAUNCH_FAILED at FAILED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6949) Invalid event: LOCALIZED at LOCALIZED

2017-08-04 Thread lujie (JIRA)
lujie created YARN-6949:
---

 Summary: Invalid event: LOCALIZED at LOCALIZED
 Key: YARN-6949
 URL: https://issues.apache.org/jira/browse/YARN-6949
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.0
Reporter: lujie


When job is running, I stop a nodemanager in one machine due to some reason, 
Then I check the logs to see the running state,I find many 
InvalidStateTransitionException:

{code:java}
rg.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
LOCALIZATION_FAILED at LOCALIZED
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource.handle(LocalizedResource.java:198)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.handle(LocalResourcesTrackerImpl.java:194)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.handle(LocalResourcesTrackerImpl.java:58)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1058)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:720)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:355)
at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
at 
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6949) Invalid event: LOCALIZATION_FAILED at LOCALIZED

2017-08-04 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115220#comment-16115220
 ] 

lujie commented on YARN-6949:
-

I check the log and also find some NullPointerException:

{code:java}
ava.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:505)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.getPathForLocalization(ResourceLocalizationService.java:1131)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1093)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:720)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:355)
at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)

{code}



> Invalid event: LOCALIZATION_FAILED at LOCALIZED
> ---
>
> Key: YARN-6949
> URL: https://issues.apache.org/jira/browse/YARN-6949
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: lujie
>
> When job is running, I stop a nodemanager in one machine due to some reason, 
> Then I check the logs to see the running state,I find many 
> InvalidStateTransitionException:
> {code:java}
> rg.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> LOCALIZATION_FAILED at LOCALIZED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource.handle(LocalizedResource.java:198)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.handle(LocalResourcesTrackerImpl.java:194)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.handle(LocalResourcesTrackerImpl.java:58)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.processHeartbeat(ResourceLocalizationService.java:1058)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:720)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:355)
> at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:48)
> at 
> org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:63)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2017-08-06 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115988#comment-16115988
 ] 

lujie commented on YARN-6948:
-

>From the actual logs. 
# RMAppImpl: application_1501695223072_0001 State change from NEW to NEW_SAVING
# RMAppImpl: application_1501695223072_0001 State change from SUBMITTED to 
ACCEPTED
# RMAppAttemptImpl: appattempt_1501695223072_0001_01 State change from NEW 
to SUBMITTED
# RMAppImpl: application_1501695223072_0001 State change from ACCEPTED to 
KILLING
# CapacityScheduler: Added Application Attempt 
appattempt_1501695223072_0001_01 to scheduler from user lujie in queue 
default
# RMAppAttemptImpl: appattempt_1501695223072_0001_01 State change from 
SUBMITTED to FINAL_SAVING
# RMAppAttemptImpl: Invalid event: ATTEMPT_ADDED at FINAL_SAVING



> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: lujie
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed

2017-09-08 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Description: 
I submit a job, but I need to kill it immediately due to some reason. Then I 
found the RM was killed.
I check the RMLog and found ArrayIndexOutOfBoundsException and 
NullPointerException.According to the log,RM was killed due to 
NullPointerException, but i still don't understand why those Exception happen
I attath the whole RM log.
{code:java}
2017-09-08 02:34:37,967 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1504809243340_0001_01. Got exception: 
java.lang.ArrayIndexOutOfBoundsException: 3
at java.util.ArrayList.add(ArrayList.java:441)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2017-09-08 02:34:37,968 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
updating app: application_1504809243340_0001
java.lang.NullPointerException
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816)
at 
com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426)
at 

[jira] [Updated] (YARN-7176) After kill command is send, the job hangs

2017-09-07 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Attachment: logs.rar

> After kill command is send, the job hangs 
> --
>
> Key: YARN-7176
> URL: https://issues.apache.org/jira/browse/YARN-7176
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 2.6.0
>Reporter: lujie
>Priority: Critical
> Attachments: logs.rar
>
>
> I submit a job, but i need to kill it immediately due to some reason. Then I 
> found the job is hang,
> I check the log and found ArrayIndexOutOfBoundsException and 
> NullPointerException in RMLog:
> {code:java}
> 2017-09-08 02:34:37,967 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
> launching appattempt_1504809243340_0001_01. Got exception: 
> java.lang.ArrayIndexOutOfBoundsException: 3
>   at java.util.ArrayList.add(ArrayList.java:441)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-09-08 02:34:37,968 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating app: application_1504809243340_0001
> java.lang.NullPointerException
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> 

[jira] [Updated] (YARN-7176) After kill command is send, the job hangs

2017-09-07 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:


[^C:\Users\Administrator\Desktop\logs.zip]

> After kill command is send, the job hangs 
> --
>
> Key: YARN-7176
> URL: https://issues.apache.org/jira/browse/YARN-7176
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 2.6.0
>Reporter: lujie
>Priority: Critical
>
> I submit a job, but i need to kill it immediately due to some reason. Then I 
> found the job is hang,
> I check the log and found ArrayIndexOutOfBoundsException and 
> NullPointerException in RMLog:
> {code:java}
> 2017-09-08 02:34:37,967 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
> launching appattempt_1504809243340_0001_01. Got exception: 
> java.lang.ArrayIndexOutOfBoundsException: 3
>   at java.util.ArrayList.add(ArrayList.java:441)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-09-08 02:34:37,968 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating app: application_1504809243340_0001
> java.lang.NullPointerException
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> 

[jira] [Issue Comment Deleted] (YARN-7176) After kill command is send, the job hangs

2017-09-07 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Comment: was deleted

(was: [^C:\Users\Administrator\Desktop\logs.zip])

> After kill command is send, the job hangs 
> --
>
> Key: YARN-7176
> URL: https://issues.apache.org/jira/browse/YARN-7176
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 2.6.0
>Reporter: lujie
>Priority: Critical
>
> I submit a job, but i need to kill it immediately due to some reason. Then I 
> found the job is hang,
> I check the log and found ArrayIndexOutOfBoundsException and 
> NullPointerException in RMLog:
> {code:java}
> 2017-09-08 02:34:37,967 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
> launching appattempt_1504809243340_0001_01. Got exception: 
> java.lang.ArrayIndexOutOfBoundsException: 3
>   at java.util.ArrayList.add(ArrayList.java:441)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-09-08 02:34:37,968 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating app: application_1504809243340_0001
> java.lang.NullPointerException
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> 

[jira] [Created] (YARN-7176) After kill command is send, the job hangs

2017-09-07 Thread lujie (JIRA)
lujie created YARN-7176:
---

 Summary: After kill command is send, the job hangs 
 Key: YARN-7176
 URL: https://issues.apache.org/jira/browse/YARN-7176
 Project: Hadoop YARN
  Issue Type: Bug
  Components: RM
Affects Versions: 2.6.0
Reporter: lujie
Priority: Critical


I submit a job, but i need to kill it immediately due to some reason. Then I 
found the job is hang,
I check the log and found ArrayIndexOutOfBoundsException and 
NullPointerException in RMLog:

{code:java}
2017-09-08 02:34:37,967 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1504809243340_0001_01. Got exception: 
java.lang.ArrayIndexOutOfBoundsException: 3
at java.util.ArrayList.add(ArrayList.java:441)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2017-09-08 02:34:37,968 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
updating app: application_1504809243340_0001
java.lang.NullPointerException
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816)
at 
com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62)
at 

[jira] [Updated] (YARN-7176) After kill command is send, the job hangs

2017-09-08 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Description: 
I submit a job, but i need to kill it immediately due to some reason. Then I 
found the RM killed,
I check the log and found ArrayIndexOutOfBoundsException and 
NullPointerException in RMLog:

{code:java}
2017-09-08 02:34:37,967 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1504809243340_0001_01. Got exception: 
java.lang.ArrayIndexOutOfBoundsException: 3
at java.util.ArrayList.add(ArrayList.java:441)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2017-09-08 02:34:37,968 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
updating app: application_1504809243340_0001
java.lang.NullPointerException
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816)
at 
com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:163)
at 

[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed

2017-09-08 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Description: 
I submit a job, but i need to kill it immediately due to some reason. Then I 
found the RM was killed,
I check the RMLog and found ArrayIndexOutOfBoundsException and 
NullPointerException :

{code:java}
2017-09-08 02:34:37,967 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1504809243340_0001_01. Got exception: 
java.lang.ArrayIndexOutOfBoundsException: 3
at java.util.ArrayList.add(ArrayList.java:441)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2017-09-08 02:34:37,968 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
updating app: application_1504809243340_0001
java.lang.NullPointerException
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816)
at 
com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:163)
at 

[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed

2017-09-08 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Description: 
I submit a job, but i need to kill it immediately due to some reason. Then I 
found the RM was killed,
I check the RMLog and found ArrayIndexOutOfBoundsException and 
NullPointerException :

{code:java}
2017-09-08 02:34:37,967 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1504809243340_0001_01. Got exception: 
java.lang.ArrayIndexOutOfBoundsException: 3
at java.util.ArrayList.add(ArrayList.java:441)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2017-09-08 02:34:37,968 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
updating app: application_1504809243340_0001
java.lang.NullPointerException
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816)
at 
com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:163)
at 

[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed

2017-09-08 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Description: 
I submit a job, but I need to kill it immediately due to some reason. Then I 
found the RM was killed.
I check the RMLog and found ArrayIndexOutOfBoundsException and 
NullPointerException.According to the log,RM was killed due to 
NullPointerException, but i still don't understand why those Exception happen

{code:java}
2017-09-08 02:34:37,967 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1504809243340_0001_01. Got exception: 
java.lang.ArrayIndexOutOfBoundsException: 3
at java.util.ArrayList.add(ArrayList.java:441)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2017-09-08 02:34:37,968 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
updating app: application_1504809243340_0001
java.lang.NullPointerException
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816)
at 
com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426)
at 

[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed

2017-09-08 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Description: 
I submit a job, but I need to kill it immediately due to some reason. Then I 
found the RM was killed.
I check the RMLog and found ArrayIndexOutOfBoundsException and 
NullPointerException.According to the log,RM was killed due to 
NullPointerException, but i still don't understand why those Exception happen
I attath the whole RM log
{code:java}
2017-09-08 02:34:37,967 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1504809243340_0001_01. Got exception: 
java.lang.ArrayIndexOutOfBoundsException: 3
at java.util.ArrayList.add(ArrayList.java:441)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2017-09-08 02:34:37,968 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
updating app: application_1504809243340_0001
java.lang.NullPointerException
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816)
at 
com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426)
at 

[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed

2017-09-08 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Description: 
I submit a job, but I need to kill it immediately due to some reason. Then I 
found the RM was killed.
I check the RMLog and found ArrayIndexOutOfBoundsException and 
NullPointerException.According to the log,RM was killed due to 
NullPointerException, but i still don't understand why those Exception happen
I attath the whole RM log.
{code:java}
2017-09-08 02:34:37,967 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1504809243340_0001_01. Got exception: 
java.lang.ArrayIndexOutOfBoundsException: 3
at java.util.ArrayList.add(ArrayList.java:441)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2017-09-08 02:34:37,968 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
updating app: application_1504809243340_0001
java.lang.NullPointerException
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816)
at 
com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426)
at 

[jira] [Updated] (YARN-7176) After kill command is send, the ResourceManager was killed

2017-09-08 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Summary: After kill command is send, the ResourceManager was killed   (was: 
After kill command is send, the job hangs )

> After kill command is send, the ResourceManager was killed 
> ---
>
> Key: YARN-7176
> URL: https://issues.apache.org/jira/browse/YARN-7176
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 2.6.0
>Reporter: lujie
>Priority: Critical
> Attachments: logs.rar
>
>
> I submit a job, but i need to kill it immediately due to some reason. Then I 
> found the RM killed,
> I check the log and found ArrayIndexOutOfBoundsException and 
> NullPointerException in RMLog:
> {code:java}
> 2017-09-08 02:34:37,967 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
> launching appattempt_1504809243340_0001_01. Got exception: 
> java.lang.ArrayIndexOutOfBoundsException: 3
>   at java.util.ArrayList.add(ArrayList.java:441)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-09-08 02:34:37,968 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating app: application_1504809243340_0001
> java.lang.NullPointerException
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> 

[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed

2017-09-08 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Summary: After job kill command is send, the ResourceManager was killed   
(was: After kill command is send, the ResourceManager was killed )

> After job kill command is send, the ResourceManager was killed 
> ---
>
> Key: YARN-7176
> URL: https://issues.apache.org/jira/browse/YARN-7176
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: RM
>Affects Versions: 2.6.0
>Reporter: lujie
>Priority: Critical
> Attachments: logs.rar
>
>
> I submit a job, but i need to kill it immediately due to some reason. Then I 
> found the RM killed,
> I check the log and found ArrayIndexOutOfBoundsException and 
> NullPointerException in RMLog:
> {code:java}
> 2017-09-08 02:34:37,967 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
> launching appattempt_1504809243340_0001_01. Got exception: 
> java.lang.ArrayIndexOutOfBoundsException: 3
>   at java.util.ArrayList.add(ArrayList.java:441)
>   at 
> com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
>   at 
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
>   at 
> org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2017-09-08 02:34:37,968 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating app: application_1504809243340_0001
> java.lang.NullPointerException
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> 

[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed

2017-09-08 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Description: 
I submit a job, but i need to kill it immediately due to some reason. Then I 
found the RM was killed,
I check the RMLog found ArrayIndexOutOfBoundsException and NullPointerException 
:

{code:java}
2017-09-08 02:34:37,967 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1504809243340_0001_01. Got exception: 
java.lang.ArrayIndexOutOfBoundsException: 3
at java.util.ArrayList.add(ArrayList.java:441)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2017-09-08 02:34:37,968 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
updating app: application_1504809243340_0001
java.lang.NullPointerException
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816)
at 
com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:163)
at 

[jira] [Updated] (YARN-7176) After job kill command is send, the ResourceManager was killed

2017-09-08 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7176:

Description: 
I submit a job, but i need to kill it immediately due to some reason. Then I 
found the RM was killed,
I check the log and found ArrayIndexOutOfBoundsException and 
NullPointerException in RMLog:

{code:java}
2017-09-08 02:34:37,967 INFO 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
launching appattempt_1504809243340_0001_01. Got exception: 
java.lang.ArrayIndexOutOfBoundsException: 3
at java.util.ArrayList.add(ArrayList.java:441)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$Builder.addAllApplicationACLs(YarnProtos.java:39956)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.addApplicationACLs(ContainerLaunchContextPBImpl.java:446)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToBuilder(ContainerLaunchContextPBImpl.java:121)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.mergeLocalToProto(ContainerLaunchContextPBImpl.java:128)
at 
org.apache.hadoop.yarn.api.records.impl.pb.ContainerLaunchContextPBImpl.getProto(ContainerLaunchContextPBImpl.java:70)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.convertToProtoFormat(StartContainerRequestPBImpl.java:156)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToBuilder(StartContainerRequestPBImpl.java:85)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.mergeLocalToProto(StartContainerRequestPBImpl.java:95)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainerRequestPBImpl.getProto(StartContainerRequestPBImpl.java:57)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.convertToProtoFormat(StartContainersRequestPBImpl.java:137)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.addLocalRequestsToProto(StartContainersRequestPBImpl.java:97)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToBuilder(StartContainersRequestPBImpl.java:79)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.mergeLocalToProto(StartContainersRequestPBImpl.java:72)
at 
org.apache.hadoop.yarn.api.protocolrecords.impl.pb.StartContainersRequestPBImpl.getProto(StartContainersRequestPBImpl.java:48)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:93)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:119)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

2017-09-08 02:34:37,968 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
updating app: application_1504809243340_0001
java.lang.NullPointerException
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto.getSerializedSize(YarnProtos.java:38512)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnProtos$ApplicationSubmissionContextProto.getSerializedSize(YarnProtos.java:28481)
at 
com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
at 
com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
at 
org.apache.hadoop.yarn.proto.YarnServerResourceManagerRecoveryProtos$ApplicationStateDataProto.getSerializedSize(YarnServerResourceManagerRecoveryProtos.java:816)
at 
com.google.protobuf.AbstractMessageLite.toByteArray(AbstractMessageLite.java:62)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.updateApplicationStateInternal(FileSystemRMStateStore.java:426)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppTransition.transition(RMStateStore.java:163)
at 

[jira] [Comment Edited] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266684#comment-16266684
 ] 

lujie edited comment on YARN-6948 at 11/27/17 12:02 PM:


Does the test failure is related to this patch.


was (Author: xiaoheipangzi):
I don't think the test failure is related to this patch.

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0, 3.0.0-alpha4
>Reporter: lujie
> Attachments: yarn-6948.png, yarn-6948.txt
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266684#comment-16266684
 ] 

lujie commented on YARN-6948:
-

I don't think the test failure is related to this patch.

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0, 3.0.0-alpha4
>Reporter: lujie
> Attachments: yarn-6948.png, yarn-6948.txt
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Description: 
I send kill command to application, nodemanager log shows:

{code:java}
2017-11-25 19:18:48,126 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 couldn't find container container_1511608703018_0001_01_01 while 
processing FINISH_CONTAINERS event
2017-11-25 19:18:48,146 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
 Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
FINISH_APPLICATION at NEW
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:745)
2017-11-25 19:18:48,151 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
 Application application_1511608703018_0001 transitioned from NEW to INITING
{code}
 

  was:
I send kill command to application, nodemanager log shows:

{code:java}
2017-11-25 19:18:48,126 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 couldn't find container container_1511608703018_0001_01_01 while 
processing FINISH_CONTAINERS event
2017-11-25 19:18:48,146 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
 Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
FINISH_APPLICATION at NEW
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:745)
2017-11-25 19:18:48,151 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
 Application application_1511608703018_0001 transitioned from NEW to INITING
{code}



> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> 

[jira] [Created] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)
lujie created YARN-7563:
---

 Summary: Invalid event: FINISH_APPLICATION at NEW
 Key: YARN-7563
 URL: https://issues.apache.org/jira/browse/YARN-7563
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.0.0-beta1
Reporter: lujie


I send kill command to application, nodemanager log shows:

{code:java}
2017-11-25 19:18:48,126 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 couldn't find container container_1511608703018_0001_01_01 while 
processing FINISH_CONTAINERS event
2017-11-25 19:18:48,146 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
 Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
FINISH_APPLICATION at NEW
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
at java.lang.Thread.run(Thread.java:745)
2017-11-25 19:18:48,151 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
 Application application_1511608703018_0001 transitioned from NEW to INITING
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Attachment: (was: YARN-7563.png)

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7536.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Attachment: (was: YARN-7536.png)

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987
 ] 

lujie edited comment on YARN-7563 at 11/27/17 4:06 PM:
---

I have find the reason by analysis code and logs

!YARN-7536.png!

above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then 
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need 
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS 
event to  ContainerManagerImpl. ContainerManagerImpl will first  (2)check the 
appID if exists in context, if it  dose, (3) send FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 


was (Author: xiaoheipangzi):
I have find the reason by analysis code and logs

[^YARN-7536.png]

above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then 
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need 
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS 
event to  ContainerManagerImpl. ContainerManagerImpl will first  (2)check the 
appID if exists in context, if it  dose, (3) send FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Attachment: YARN-7536.png

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7536.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987
 ] 

lujie edited comment on YARN-7563 at 11/27/17 4:07 PM:
---

I have find the reason by analysis code and logs

!YARN-7563.png!

above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then 
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need 
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS 
event to  ContainerManagerImpl. ContainerManagerImpl will first  (2)check the 
appID if exists in context, if it  dose, (3) send FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 


was (Author: xiaoheipangzi):
I have find the reason by analysis code and logs

!YARN-7536.png!

above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then 
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need 
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS 
event to  ContainerManagerImpl. ContainerManagerImpl will first  (2)check the 
appID if exists in context, if it  dose, (3) send FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Attachment: YARN-7563.png

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987
 ] 

lujie edited comment on YARN-7563 at 11/27/17 4:10 PM:
---

I have find the reason by analysis code and logs
[^YARN-7563.png]
above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by 
ContainerManagerImpl.startContainerInternal, this method will (1)put appID in 
context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives 
the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and 
send FINISH_APPS event to  ContainerManagerImpl. ContainerManagerImpl will 
first  (2)check the appID if exists in context, if it  does, then (3) send 
FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 


was (Author: xiaoheipangzi):
I have find the reason by analysis code and logs
[^YARN-7563.png]
above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by 
ContainerManagerImpl.startContainerInternal, this method will (1)put appID in 
context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives 
the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and 
send FINISH_APPS event to  ContainerManagerImpl. ContainerManagerImpl will 
first  (2)check the appID if exists in context, if it  dose, (3) send 
FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Attachment: YARN-7536.png

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7536.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987
 ] 

lujie edited comment on YARN-7563 at 11/27/17 4:05 PM:
---

I have find the reason by analysis code and logs

[^YARN-7536.png]

above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then 
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need 
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS 
event to  ContainerManagerImpl. ContainerManagerImpl will first  (2)check the 
appID if exists in context, if it  dose, (3) send FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 


was (Author: xiaoheipangzi):
I have find the reason by analysis code and logs



above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then 
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need 
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS 
event to  ContainerManagerImpl. ContainerManagerImpl will first  (2)check the 
appID if exists in context, if it  dose, (3) send FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7536.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266994#comment-16266994
 ] 

lujie commented on YARN-7563:
-

!YARN-7563.png!

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987
 ] 

lujie edited comment on YARN-7563 at 11/27/17 4:08 PM:
---

I have find the reason by analysis code and logs
[^YARN-7563.png]
above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then 
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need 
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS 
event to  ContainerManagerImpl. ContainerManagerImpl will first  (2)check the 
appID if exists in context, if it  dose, (3) send FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 


was (Author: xiaoheipangzi):
I have find the reason by analysis code and logs

!YARN-7563.png!

above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then 
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need 
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS 
event to  ContainerManagerImpl. ContainerManagerImpl will first  (2)check the 
appID if exists in context, if it  dose, (3) send FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Attachment: screenshot-1.png

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Attachment: (was: screenshot-1.png)

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Attachment: YARN-7563.png

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Comment: was deleted

(was: !YARN-7563.png!)

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987
 ] 

lujie edited comment on YARN-7563 at 11/27/17 4:09 PM:
---

I have find the reason by analysis code and logs
[^YARN-7563.png]
above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by 
ContainerManagerImpl.startContainerInternal, this method will (1)put appID in 
context and then (4)send INIT_APPLICATION. Meanwhile NodeManager apperceives 
the app that need to be cleaned by ResourceTrackerService.nodeHeartbeat, and 
send FINISH_APPS event to  ContainerManagerImpl. ContainerManagerImpl will 
first  (2)check the appID if exists in context, if it  dose, (3) send 
FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 


was (Author: xiaoheipangzi):
I have find the reason by analysis code and logs
[^YARN-7563.png]
above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then 
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need 
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS 
event to  ContainerManagerImpl. ContainerManagerImpl will first  (2)check the 
appID if exists in context, if it  dose, (3) send FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987
 ] 

lujie edited comment on YARN-7563 at 11/27/17 4:04 PM:
---

I have find the reason by analysis code and logs



above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then 
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need 
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS 
event to  ContainerManagerImpl. ContainerManagerImpl will first  (2)check the 
appID if exists in context, if it  dose, (3) send FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 


was (Author: xiaoheipangzi):
I have find the reason by analysis code and logs


above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then 
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need 
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS 
event to  ContainerManagerImpl. ContainerManagerImpl will first  (2)check the 
appID if exists in context, if it  dose, (3) send FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987
 ] 

lujie commented on YARN-7563:
-

I have find the reason by analysis code and logs
!YARN-7536.png!

above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then 
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need 
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS 
event to  ContainerManagerImpl. ContainerManagerImpl will first  (2)check the 
appID if exists in context, if it  dose, (3) send FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7536.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266987#comment-16266987
 ] 

lujie edited comment on YARN-7563 at 11/27/17 4:03 PM:
---

I have find the reason by analysis code and logs


above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then 
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need 
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS 
event to  ContainerManagerImpl. ContainerManagerImpl will first  (2)check the 
appID if exists in context, if it  dose, (3) send FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 


was (Author: xiaoheipangzi):
I have find the reason by analysis code and logs
!YARN-7536.png!

above figure has shown the reason:client submit a application and then send 
kill command. NM will start Container by ContainerManagerImpl
.startContainerInternal, this method will (1)put appID in context and then 
(4)send INIT_APPLICATION. Meanwhile NodeManager apperceives the app that need 
to be cleaned by ResourceTrackerService.nodeHeartbeat, and send FINISH_APPS 
event to  ContainerManagerImpl. ContainerManagerImpl will first  (2)check the 
appID if exists in context, if it  dose, (3) send FINISH_APPLICATION. 
This bug manifests needing  two condition: (1) happens before(2) and (3) 
happens before(4). one of them is violated, this bug will be hidden.
I need to future check the ApplicationImpl code, make sure whether 
AppFinishTriggeredTransition needed to fix this bug. 

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Attachment: (was: YARN-7536.png)

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0-beta1
>Reporter: lujie
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2017-11-26 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-6948:

Attachment: yarn-6948.png

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: lujie
> Attachments: yarn-6948.png
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266480#comment-16266480
 ] 

lujie commented on YARN-6948:
-

Hi:
Recently I restudy this bug, and find the bug reason
!yarn-6948.png!
When the applicationAttempt performs AttemptstartedTransition, it will send  
AppAttemptAddedSchedulerEvent to CapacityScheduler and transform to  SUBMITTED, 
then the CapacityScheduler will send ATTEMPT_ADDED back to applicationAttempt, 
But if client send kill command to applicationAttempt, applicationAttempt will 
transform to FINAL_SAVING , and if ATTEMPT_ADDED arrives before 
applicationAttempt chang its state from  FINAL_SAVING to KILLED,  
applicationAttempt  will throw InvalidStateTransitonException exception. 
it will be ok if ATTEMPT_ADDED arrives at KILLED or SUBMITTED

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: lujie
> Attachments: yarn-6948.png
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-6948:

Attachment: yarn-6948.txt

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: lujie
> Attachments: yarn-6948.png, yarn-6948.txt
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266480#comment-16266480
 ] 

lujie edited comment on YARN-6948 at 11/27/17 8:11 AM:
---

Hi:
Recently I restudy this bug, and find the bug reason
!yarn-6948.png!
When the applicationAttempt performs AttemptstartedTransition, it will send  
AppAttemptAddedSchedulerEvent to CapacityScheduler and transform to  SUBMITTED, 
then the CapacityScheduler will send ATTEMPT_ADDED back to applicationAttempt, 
But if client send kill command to applicationAttempt, applicationAttempt will 
transform to FINAL_SAVING , and if ATTEMPT_ADDED arrives before 
applicationAttempt chang its state from  FINAL_SAVING to KILLED,  
applicationAttempt  will throw InvalidStateTransitonException exception. 
it will be ok if ATTEMPT_ADDED arrives at KILLED(ignore the event) or SUBMITTED


was (Author: xiaoheipangzi):
Hi:
Recently I restudy this bug, and find the bug reason
!yarn-6948.png!
When the applicationAttempt performs AttemptstartedTransition, it will send  
AppAttemptAddedSchedulerEvent to CapacityScheduler and transform to  SUBMITTED, 
then the CapacityScheduler will send ATTEMPT_ADDED back to applicationAttempt, 
But if client send kill command to applicationAttempt, applicationAttempt will 
transform to FINAL_SAVING , and if ATTEMPT_ADDED arrives before 
applicationAttempt chang its state from  FINAL_SAVING to KILLED,  
applicationAttempt  will throw InvalidStateTransitonException exception. 
it will be ok if ATTEMPT_ADDED arrives at KILLED or SUBMITTED

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0
>Reporter: lujie
> Attachments: yarn-6948.png
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2017-11-27 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16266487#comment-16266487
 ] 

lujie edited comment on YARN-6948 at 11/27/17 8:17 AM:
---

I have download the hadoop source code from github, the version is 
3.1.0-SNAPSHOT, and Creating a patch [^yarn-6948.txt]


was (Author: xiaoheipangzi):
[^yarn-6948.txt]

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0, 3.0.0-alpha4
>Reporter: lujie
> Attachments: yarn-6948.png, yarn-6948.txt
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW may

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Summary: Invalid event: FINISH_APPLICATION at NEW may  (was: Invalid event: 
FINISH_APPLICATION at NEW)

> Invalid event: FINISH_APPLICATION at NEW may
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0, 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW may make some application level resource be not cleaned

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Summary: Invalid event: FINISH_APPLICATION at NEW  may make some 
application level resource be not cleaned  (was: Invalid event: 
FINISH_APPLICATION at NEW may)

> Invalid event: FINISH_APPLICATION at NEW  may make some application level 
> resource be not cleaned
> -
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0, 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Affects Version/s: 2.6.0

> Invalid event: FINISH_APPLICATION at NEW
> 
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0, 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW may make some application level resource be not cleaned

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Attachment: YARN-7563.txt

> Invalid event: FINISH_APPLICATION at NEW  may make some application level 
> resource be not cleaned
> -
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0, 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png, YARN-7563.txt
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW may make some application level resource be not cleaned

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Attachment: (was: YARN-7563.txt)

> Invalid event: FINISH_APPLICATION at NEW  may make some application level 
> resource be not cleaned
> -
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0, 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW may make some application level resource be not cleaned

2017-11-28 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268347#comment-16268347
 ] 

lujie commented on YARN-7563:
-

I just attach a patch that contains a unit test to show this bugs. I also try 
to fix it based on existing code, but i am not sure whether my solution is 
good. please check it and let me now how to fix it better.

> Invalid event: FINISH_APPLICATION at NEW  may make some application level 
> resource be not cleaned
> -
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0, 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png, YARN-7563.txt
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7563) Invalid event: FINISH_APPLICATION at NEW may make some application level resource be not cleaned

2017-11-27 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7563:

Attachment: YARN-7563.txt

> Invalid event: FINISH_APPLICATION at NEW  may make some application level 
> resource be not cleaned
> -
>
> Key: YARN-7563
> URL: https://issues.apache.org/jira/browse/YARN-7563
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0, 3.0.0-beta1
>Reporter: lujie
> Attachments: YARN-7563.png, YARN-7563.txt
>
>
> I send kill command to application, nodemanager log shows:
> {code:java}
> 2017-11-25 19:18:48,126 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  couldn't find container container_1511608703018_0001_01_01 while 
> processing FINISH_CONTAINERS event
> 2017-11-25 19:18:48,146 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> FINISH_APPLICATION at NEW
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:627)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:75)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1508)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1501)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
> at java.lang.Thread.run(Thread.java:745)
> 2017-11-25 19:18:48,151 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1511608703018_0001 transitioned from NEW to INITING
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6950) Invalid event: LAUNCH_FAILED at FAILED

2017-12-15 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293626#comment-16293626
 ] 

lujie commented on YARN-6950:
-

Hi, i found this bug duplicates with yarn-933

> Invalid event: LAUNCH_FAILED at FAILED
> --
>
> Key: YARN-6950
> URL: https://issues.apache.org/jira/browse/YARN-6950
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0
>Reporter: lujie
> Fix For: 2.7.0
>
>
> A RMAppAttemptImpl fail due to some reason,meanwhile AM fails to launch a 
> container and send event  LAUNCH_FAILED,and the StateMachine can not handle 
> it:
> {code:java}
> 2017-07-05 03:33:09,013 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> LAUNCH_FAILED at FAILED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6950) Invalid event: LAUNCH_FAILED at FAILED

2017-12-15 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-6950:

Fix Version/s: 2.7.0

> Invalid event: LAUNCH_FAILED at FAILED
> --
>
> Key: YARN-6950
> URL: https://issues.apache.org/jira/browse/YARN-6950
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0
>Reporter: lujie
> Fix For: 2.7.0
>
>
> A RMAppAttemptImpl fail due to some reason,meanwhile AM fails to launch a 
> container and send event  LAUNCH_FAILED,and the StateMachine can not handle 
> it:
> {code:java}
> 2017-07-05 03:33:09,013 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> LAUNCH_FAILED at FAILED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6950) Invalid event: LAUNCH_FAILED at FAILED

2017-12-15 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie resolved YARN-6950.
-
Resolution: Duplicate

> Invalid event: LAUNCH_FAILED at FAILED
> --
>
> Key: YARN-6950
> URL: https://issues.apache.org/jira/browse/YARN-6950
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0
>Reporter: lujie
> Fix For: 2.7.0
>
>
> A RMAppAttemptImpl fail due to some reason,meanwhile AM fails to launch a 
> container and send event  LAUNCH_FAILED,and the StateMachine can not handle 
> it:
> {code:java}
> 2017-07-05 03:33:09,013 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> LAUNCH_FAILED at FAILED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2018-05-23 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-6948:

Priority: Major  (was: Minor)

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0, 3.0.0-alpha4
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: YARN-6948_1.patch, YARN-6948_2.patch, yarn-6948.png, 
> yarn-6948.txt
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-05-23 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Priority: Major  (was: Minor)

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Major
>  Labels: patch
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, 
> YARN-7663_4.patch, YARN-7663_5.patch, YARN-7663_6.patch, YARN-7663_7.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7786) NullPointerException while launching ApplicationMaster

2018-05-23 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7786:

Priority: Major  (was: Minor)

> NullPointerException while launching ApplicationMaster
> --
>
> Key: YARN-7786
> URL: https://issues.apache.org/jira/browse/YARN-7786
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-beta1
>Reporter: lujie
>Assignee: lujie
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: YARN-7786.patch, YARN-7786_1.patch, YARN-7786_2.patch, 
> YARN-7786_3.patch, YARN-7786_4.patch, YARN-7786_5.patch, YARN-7786_6.patch, 
> resourcemanager.log
>
>
> Before launching the ApplicationMaster, send kill command to the job, then 
> some Null pointer appears:
> {code}
> 2017-11-25 21:27:25,333 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error 
> launching appattempt_1511616410268_0001_01. Got exception: 
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:205)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:193)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:112)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:304)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case

2018-05-31 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8381:

Description: 
I started a fresh pseudo-distributed system on an node, then run a  job but it 
stuck. My first reaction was checking log message to local problem, but 
obtaining no error message.

After  reading log messages for long time, I waked up to check the node health 
. The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs 
are bad: /tmp/hadoop-hduser/nm-local-dir".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem.

{color:#d04437}*But I still  strongly recommend adding error log messages for 
unhealthy nodemanger(especially startup).*{color}

  was:
I started a fresh pseudo-distributed system on an node, then run a  job but it 
stuck. My first reaction was checking log message to local problem, but 
obtaining no error message.

After  reading log messages for long time, I waked up to check the node health 
. The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs 
are bad: /tmp/hadoop-hduser/nm-local-dir".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem.

{color:#d04437}*But I still  strongly recommend adding error log messages for 
unhealthy nodemanger(especially for startup).*{color}


> Job got stuck while node was unhealthy, but without log messages to indicate 
> such case
> --
>
> Key: YARN-8381
> URL: https://issues.apache.org/jira/browse/YARN-8381
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: lujie
>Priority: Major
>
> I started a fresh pseudo-distributed system on an node, then run a  job but 
> it stuck. My first reaction was checking log message to local problem, but 
> obtaining no error message.
> After  reading log messages for long time, I waked up to check the node 
> health . The Yarn web UI showed that the nodemanager is unhealthy, due to 
> "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir".  I reconfigure the 
> "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
>  to 98% and solved this problem.
> {color:#d04437}*But I still  strongly recommend adding error log messages for 
> unhealthy nodemanger(especially startup).*{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case

2018-05-31 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8381:

Description: 
I started a fresh pseudo-distributed system on an node, then run a  job but it 
stuck. My first reaction was checking log message to local problem, but 
obtaining no error message.

After  reading log messages for long time, I waked up to check the node health 
. The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs 
are bad: /tmp/hadoop-hduser/nm-local-dir".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem.

{color:#d04437}*But I still  strongly recommend adding error log messages for 
unhealthy nodemanger(especially for startup).*{color}

  was:
I started a fresh pseudo-distributed system on an node, then run a  job but it 
stuck. My first reaction was checking log message to local problem, but 
obtaining no error message.

After  reading log messages for long time, I waked up to check the node health 
. The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs 
are bad: /tmp/hadoop-hduser/nm-local-dir".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem.

{color:#d04437}*But I still  strongly recommend adding error log messages for 
unhealthy nodemanger.*{color}


> Job got stuck while node was unhealthy, but without log messages to indicate 
> such case
> --
>
> Key: YARN-8381
> URL: https://issues.apache.org/jira/browse/YARN-8381
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: lujie
>Priority: Major
>
> I started a fresh pseudo-distributed system on an node, then run a  job but 
> it stuck. My first reaction was checking log message to local problem, but 
> obtaining no error message.
> After  reading log messages for long time, I waked up to check the node 
> health . The Yarn web UI showed that the nodemanager is unhealthy, due to 
> "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir".  I reconfigure the 
> "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
>  to 98% and solved this problem.
> {color:#d04437}*But I still  strongly recommend adding error log messages for 
> unhealthy nodemanger(especially for startup).*{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8381) Job get stuck while node is unhealthy, but without log messages to indicate such case

2018-05-30 Thread lujie (JIRA)
lujie created YARN-8381:
---

 Summary: Job get stuck while node is unhealthy, but without log 
messages to indicate such case
 Key: YARN-8381
 URL: https://issues.apache.org/jira/browse/YARN-8381
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: lujie


I started a fresh pseudo-distributed system on an node, then run a  job but it 
stuck. My first reaction was checking log message to local problem, but 
obtaining no error message. Then I waked up to check the node health after  
reading log message for long time. The Yarn web UI showed that the nodemanager 
is unhealthy, due to the "l{{ocal-dirs are bad: 
/tmp/hadoop-hduser/nm-local-dir}}".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem. But I still  strongly recommend adding error 
log messages for unhealthy nodemanger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8381) Job got stuck while node is unhealthy, but without log messages to indicate such case

2018-05-30 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8381:

Summary: Job got stuck while node is unhealthy, but without log messages to 
indicate such case  (was: Job get stuck while node is unhealthy, but without 
log messages to indicate such case)

> Job got stuck while node is unhealthy, but without log messages to indicate 
> such case
> -
>
> Key: YARN-8381
> URL: https://issues.apache.org/jira/browse/YARN-8381
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: lujie
>Priority: Major
>
> I started a fresh pseudo-distributed system on an node, then run a  job but 
> it stuck. My first reaction was checking log message to local problem, but 
> obtaining no error message. Then I waked up to check the node health after  
> reading log message for long time. The Yarn web UI showed that the 
> nodemanager is unhealthy, due to the "l{{ocal-dirs are bad: 
> /tmp/hadoop-hduser/nm-local-dir}}".  I reconfigure the 
> "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
>  to 98% and solved this problem. But I still  strongly recommend adding error 
> log messages for unhealthy nodemanger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case

2018-05-30 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8381:

Description: 
I started a fresh pseudo-distributed system on an node, then run a  job but it 
stuck. My first reaction was checking log message to local problem, but 
obtaining no error message.

After  reading log messages for long time, I waked up to check the node health 
. The Yarn web UI showed that the nodemanager is unhealthy, due to the 
"l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem. But I still  strongly recommend adding error 
log messages for unhealthy nodemanger.

  was:I started a fresh pseudo-distributed system on an node, then run a  job 
but it stuck. My first reaction was checking log message to local problem, but 
obtaining no error message. Then I waked up to check the node health after  
reading log message for long time. The Yarn web UI showed that the nodemanager 
is unhealthy, due to the "l{{ocal-dirs are bad: 
/tmp/hadoop-hduser/nm-local-dir}}".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem. But I still  strongly recommend adding error 
log messages for unhealthy nodemanger.


> Job got stuck while node was unhealthy, but without log messages to indicate 
> such case
> --
>
> Key: YARN-8381
> URL: https://issues.apache.org/jira/browse/YARN-8381
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: lujie
>Priority: Major
>
> I started a fresh pseudo-distributed system on an node, then run a  job but 
> it stuck. My first reaction was checking log message to local problem, but 
> obtaining no error message.
> After  reading log messages for long time, I waked up to check the node 
> health . The Yarn web UI showed that the nodemanager is unhealthy, due to the 
> "l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}".  I reconfigure 
> the 
> "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
>  to 98% and solved this problem. But I still  strongly recommend adding error 
> log messages for unhealthy nodemanger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case

2018-05-30 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8381:

Summary: Job got stuck while node was unhealthy, but without log messages 
to indicate such case  (was: Job got stuck while node is unhealthy, but without 
log messages to indicate such case)

> Job got stuck while node was unhealthy, but without log messages to indicate 
> such case
> --
>
> Key: YARN-8381
> URL: https://issues.apache.org/jira/browse/YARN-8381
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: lujie
>Priority: Major
>
> I started a fresh pseudo-distributed system on an node, then run a  job but 
> it stuck. My first reaction was checking log message to local problem, but 
> obtaining no error message. Then I waked up to check the node health after  
> reading log message for long time. The Yarn web UI showed that the 
> nodemanager is unhealthy, due to the "l{{ocal-dirs are bad: 
> /tmp/hadoop-hduser/nm-local-dir}}".  I reconfigure the 
> "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
>  to 98% and solved this problem. But I still  strongly recommend adding error 
> log messages for unhealthy nodemanger.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case

2018-05-30 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8381:

Description: 
I started a fresh pseudo-distributed system on an node, then run a  job but it 
stuck. My first reaction was checking log message to local problem, but 
obtaining no error message.

After  reading log messages for long time, I waked up to check the node health 
. The Yarn web UI showed that the nodemanager is unhealthy, due to "local-dirs 
are bad: /tmp/hadoop-hduser/nm-local-dir".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem.

{color:#d04437}*But I still  strongly recommend adding error log messages for 
unhealthy nodemanger.*{color}

  was:
I started a fresh pseudo-distributed system on an node, then run a  job but it 
stuck. My first reaction was checking log message to local problem, but 
obtaining no error message.

After  reading log messages for long time, I waked up to check the node health 
. The Yarn web UI showed that the nodemanager is unhealthy, due to the 
"local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem.

{color:#d04437}*But I still  strongly recommend adding error log messages for 
unhealthy nodemanger.*{color}


> Job got stuck while node was unhealthy, but without log messages to indicate 
> such case
> --
>
> Key: YARN-8381
> URL: https://issues.apache.org/jira/browse/YARN-8381
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: lujie
>Priority: Major
>
> I started a fresh pseudo-distributed system on an node, then run a  job but 
> it stuck. My first reaction was checking log message to local problem, but 
> obtaining no error message.
> After  reading log messages for long time, I waked up to check the node 
> health . The Yarn web UI showed that the nodemanager is unhealthy, due to 
> "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir".  I reconfigure the 
> "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
>  to 98% and solved this problem.
> {color:#d04437}*But I still  strongly recommend adding error log messages for 
> unhealthy nodemanger.*{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case

2018-05-30 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8381:

Description: 
I started a fresh pseudo-distributed system on an node, then run a  job but it 
stuck. My first reaction was checking log message to local problem, but 
obtaining no error message.

After  reading log messages for long time, I waked up to check the node health 
. The Yarn web UI showed that the nodemanager is unhealthy, due to the 
"l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem.

{color:#d04437}*But I still  strongly recommend adding error log messages for 
unhealthy nodemanger.*{color}

  was:
I started a fresh pseudo-distributed system on an node, then run a  job but it 
stuck. My first reaction was checking log message to local problem, but 
obtaining no error message.

After  reading log messages for long time, I waked up to check the node health 
. The Yarn web UI showed that the nodemanager is unhealthy, due to the 
"l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem. But I still  strongly recommend adding error 
log messages for unhealthy nodemanger.


> Job got stuck while node was unhealthy, but without log messages to indicate 
> such case
> --
>
> Key: YARN-8381
> URL: https://issues.apache.org/jira/browse/YARN-8381
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: lujie
>Priority: Major
>
> I started a fresh pseudo-distributed system on an node, then run a  job but 
> it stuck. My first reaction was checking log message to local problem, but 
> obtaining no error message.
> After  reading log messages for long time, I waked up to check the node 
> health . The Yarn web UI showed that the nodemanager is unhealthy, due to the 
> "l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}".  I reconfigure 
> the 
> "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
>  to 98% and solved this problem.
> {color:#d04437}*But I still  strongly recommend adding error log messages for 
> unhealthy nodemanger.*{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8381) Job got stuck while node was unhealthy, but without log messages to indicate such case

2018-05-30 Thread lujie (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-8381:

Description: 
I started a fresh pseudo-distributed system on an node, then run a  job but it 
stuck. My first reaction was checking log message to local problem, but 
obtaining no error message.

After  reading log messages for long time, I waked up to check the node health 
. The Yarn web UI showed that the nodemanager is unhealthy, due to the 
"local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem.

{color:#d04437}*But I still  strongly recommend adding error log messages for 
unhealthy nodemanger.*{color}

  was:
I started a fresh pseudo-distributed system on an node, then run a  job but it 
stuck. My first reaction was checking log message to local problem, but 
obtaining no error message.

After  reading log messages for long time, I waked up to check the node health 
. The Yarn web UI showed that the nodemanager is unhealthy, due to the 
"l\{{ocal-dirs are bad: /tmp/hadoop-hduser/nm-local-dir}}".  I reconfigure the 
"{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
 to 98% and solved this problem.

{color:#d04437}*But I still  strongly recommend adding error log messages for 
unhealthy nodemanger.*{color}


> Job got stuck while node was unhealthy, but without log messages to indicate 
> such case
> --
>
> Key: YARN-8381
> URL: https://issues.apache.org/jira/browse/YARN-8381
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: lujie
>Priority: Major
>
> I started a fresh pseudo-distributed system on an node, then run a  job but 
> it stuck. My first reaction was checking log message to local problem, but 
> obtaining no error message.
> After  reading log messages for long time, I waked up to check the node 
> health . The Yarn web UI showed that the nodemanager is unhealthy, due to the 
> "local-dirs are bad: /tmp/hadoop-hduser/nm-local-dir".  I reconfigure the 
> "{{yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage}}"
>  to 98% and solved this problem.
> {color:#d04437}*But I still  strongly recommend adding error log messages for 
> unhealthy nodemanger.*{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-05 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: YARN-7663_5.patch

Hi:

{code:java}
Rather than calling createNewTestApp then throwing away the results, it would 
be cleaner to extend createNewTestApp to take a boolean parameter specifying 
whether to create an app with invalid state transition detection or without. 
Alternatively you could factor out the rmContext, scheduler, and conf setup 
from createNewTestApp so the test can leverage it without needing to do all the 
other, unrelated stuff in createNewTestApp.
{code}

After I implement both of the two plans, I perform the second plan because it 
will add less code and more cleaner.  In the new patch , I  factor out the 
unrelated arguments that passed(set them to null) to  constructed function of 
RMAppImpl as more as possible.

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, 
> YARN-7663_4.patch, YARN-7663_5.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-06 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314450#comment-16314450
 ] 

lujie edited comment on YARN-7663 at 1/6/18 8:25 AM:
-

different from YARN-7663_5.patch
1. Replace fooTestAppNewKill with testAppStartAfterKilled at line 61
2. fix checkstyle error


was (Author: xiaoheipangzi):
different from YARN-7663_5.patch
1. Replace fooTestAppNewKill with testAppStartAfterKilled
2. fix checkstyle error

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, 
> YARN-7663_4.patch, YARN-7663_5.patch, YARN-7663_6.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-06 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16314450#comment-16314450
 ] 

lujie edited comment on YARN-7663 at 1/6/18 8:25 AM:
-

different from YARN-7663_5.patch
1. Replace fooTestAppNewKill with testAppStartAfterKilled
2. fix checkstyle error


was (Author: xiaoheipangzi):
different from YARN-7663_6.patch
1. Replace fooTestAppNewKill with testAppStartAfterKilled
2. fix checkstyle error

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, 
> YARN-7663_4.patch, YARN-7663_5.patch, YARN-7663_6.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-06 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: YARN-7663_6.patch

different from YARN-7663_6.patch
1. Replace fooTestAppNewKill with testAppStartAfterKilled
2. fix checkstyle error

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, 
> YARN-7663_4.patch, YARN-7663_5.patch, YARN-7663_6.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2018-01-08 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-6948:

Attachment: YARN-6948_2.patch

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0, 3.0.0-alpha4
>Reporter: lujie
> Attachments: YARN-6948_1.patch, YARN-6948_2.patch, yarn-6948.png, 
> yarn-6948.txt
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2018-01-08 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317592#comment-16317592
 ] 

lujie edited comment on YARN-6948 at 1/9/18 6:06 AM:
-

After discuss [YARN-7663|https://issues.apache.org/jira/browse/YARN-7663] with 
[#Jason Lowe], I think this bug can have same unit test strategy. 
[^YARN-6948_1.patch] is not clean and has checkstyle errors, I reattach the 
[^YARN-6948_2.patch]


was (Author: xiaoheipangzi):
After discuss [YARN-7663|https://issues.apache.org/jira/browse/YARN-7663] with 
[#Jason Lowe], I think this bug can have same unit test strategy. 
YARN-6948_1.patch is not clean and has checkstyle errors, I reattach the 
YARN-6948_2

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0, 3.0.0-alpha4
>Reporter: lujie
> Attachments: YARN-6948_1.patch, YARN-6948_2.patch, yarn-6948.png, 
> yarn-6948.txt
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2018-01-08 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317592#comment-16317592
 ] 

lujie edited comment on YARN-6948 at 1/9/18 6:04 AM:
-

After discuss [YARN-7663|https://issues.apache.org/jira/browse/YARN-7663] with 
[#Jason Lowe], I think this bug can have same unit test strategy. 


was (Author: xiaoheipangzi):
After discuss [YARN-7663|https://issues.apache.org/jira/browse/YARN-7663] with 
[#Jason Lowe], I think this bug can have same unit test strategy. The only 
difference is that I override the onInvalidTranstion in a independent class 
RMAppAttemptImplForTest. 
And there exists two checksyte errors in my locally running,   but i have no 
idea to fix them, any suggestion?

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0, 3.0.0-alpha4
>Reporter: lujie
> Attachments: YARN-6948_1.patch, YARN-6948_2.patch, yarn-6948.png, 
> yarn-6948.txt
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2018-01-08 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317592#comment-16317592
 ] 

lujie edited comment on YARN-6948 at 1/9/18 6:05 AM:
-

After discuss [YARN-7663|https://issues.apache.org/jira/browse/YARN-7663] with 
[#Jason Lowe], I think this bug can have same unit test strategy. 
YARN-6948_1.patch is not clean and has checkstyle errors, I reattach the 
YARN-6948_2


was (Author: xiaoheipangzi):
After discuss [YARN-7663|https://issues.apache.org/jira/browse/YARN-7663] with 
[#Jason Lowe], I think this bug can have same unit test strategy. 

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0, 3.0.0-alpha4
>Reporter: lujie
> Attachments: YARN-6948_1.patch, YARN-6948_2.patch, yarn-6948.png, 
> yarn-6948.txt
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING

2018-01-08 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-6948:

Attachment: YARN-6948_1.patch

After discuss [YARN-7663|https://issues.apache.org/jira/browse/YARN-7663] with 
[#Jason Lowe], I think this bug can have same unit test strategy. The only 
difference is that I override the onInvalidTranstion in a independent class 
RMAppAttemptImplForTest. 

> Invalid event: ATTEMPT_ADDED at FINAL_SAVING
> 
>
> Key: YARN-6948
> URL: https://issues.apache.org/jira/browse/YARN-6948
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.8.0, 3.0.0-alpha4
>Reporter: lujie
> Attachments: YARN-6948_1.patch, yarn-6948.png, yarn-6948.txt
>
>
> When I send kill command to a running job, I check the logs and find the 
> Exception:
> {code:java}
> 2017-08-03 01:35:20,485 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_ADDED at FINAL_SAVING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-05 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: YARN-7663_4.patch

Hi:
I have moved the method that performs assert to new test just as [#Jason Lowe] 
suggest.
But I still feel uncertain about the TODO that exists in RMAppImpl handle foo 
when I add onInvalidStateTransition. Below is the code:

{code:java}
try {
/* keep the master in sync with the state machine */
this.stateMachine.doTransition(event.getType(), event);
  } catch (InvalidStateTransitionException e) {
LOG.error("App: " + appID
+ " can't handle this event at current state", e);
onInvalidStateTransition(event.getType(), oldState);
{color:red}/* TODO fail the application on the failed transition 
*/{color}
  }
{code}

The TODO already exists in system for a long time, if this TODO is meaningless, 
it should be deleted. If it is really needed to implement, I think the 
implementation can be placed in new added foo(onInvalidStateTransition).  

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, 
> YARN-7663_4.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7703) Apps killed from the NEW state are not recorded in the state store

2018-01-05 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312814#comment-16312814
 ] 

lujie edited comment on YARN-7703 at 1/5/18 10:09 AM:
--

I have a initial fix idea which need to be review:
While application receive KILL event at NEW state, current code use 
AppKilledTransition which ignores storing state. We can use 
{code:java}
new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED)
{code}
 to replace AppKilledTransition and the postState should be changed to 
FINAL_SAVING. FinalSavingTransition will tell StateStore to  perform store 
action. The stateStore will reply APP_UPDATE_SAVED  back to application.

In unit test TestRMAppTransitions#testAppNewKill, we only need add a line 
{color:#d04437}assertAppState(RMAppState.FINAL_SAVING, application);{color}  
just before perform sendAppUpdateSavedEvent

i would attach a patch after YARN-7663 fixed, and  this patch should fix 
another InvalidStateTransitionException(only mark it here).






was (Author: xiaoheipangzi):
I have a initial fix idea which need to be review:
While application receive KILL event at NEW state, current code use 
AppKilledTransition which ignores storing state. We can use 
{code:java}
new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED)
{code}
 to replace AppKilledTransition and the postState should be changed to 
FINAL_SAVING. FinalSavingTransition will tell StateStore to  perform store 
action. The stateStore will reply APP_UPDATE_SAVED  back to application.

In unit test TestRMAppTransitions#testAppNewKill, we only need add a line 
{color:#d04437}assertAppState(RMAppState.FINAL_SAVING, application);{color}  
before perform sendAppUpdateSavedEvent

i would attach a patch after YARN-7663 fixed, and  this patch should fix 
another InvalidStateTransitionException(only mark it here).





> Apps killed from the NEW state are not recorded in the state store
> --
>
> Key: YARN-7703
> URL: https://issues.apache.org/jira/browse/YARN-7703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: lujie
>
> While reviewing YARN-7663 I noticed that apps killed from the NEW state skip 
> storing anything to the RM state store.  That means upon restart and recovery 
> these apps will not be recovered, so they will simply disappear.  That could 
> be surprising for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7703) Apps killed from the NEW state are not recorded in the state store

2018-01-05 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312814#comment-16312814
 ] 

lujie edited comment on YARN-7703 at 1/5/18 10:08 AM:
--

I have a initial fix idea which need to be review:
While application receive KILL event at NEW state, current code use 
AppKilledTransition which ignores storing state. We can use 
{code:java}
new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED)
{code}
 to replace AppKilledTransition and the postState should be changed to 
FINAL_SAVING. FinalSavingTransition will tell StateStore to  perform store 
action. The stateStore will reply APP_UPDATE_SAVED  back to application.

In unit test TestRMAppTransitions#testAppNewKill, we only need add a line 
{color:#d04437}assertAppState(RMAppState.FINAL_SAVING, application);{color}  
before perform sendAppUpdateSavedEvent

i would attach a patch after YARN-7663 fixed, and  this patch should fix 
another InvalidStateTransitionException(only mark it here).






was (Author: xiaoheipangzi):
I have a initial fix idea which need to be review:
While application receive KILL event at NEW state, current code use 
AppKilledTransition which ignores storing state. We can use 
{code:java}
new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED)
{code}
 to replace AppKilledTransition and the postState should be changed to 
FINAL_SAVING. FinalSavingTransition will tell StateStore to  perform store 
action. The stateStore will reply APP_UPDATE_SAVED  back to application.

In unit test TestRMAppTransitions#testAppNewKill, we only need add a line 
:assertAppState(RMAppState.FINAL_SAVING, application);  before 
sendAppUpdateSavedEvent

i would attach a patch after YARN-7663 fixed, and  this patch should fix 
another InvalidStateTransitionException(only mark it here).





> Apps killed from the NEW state are not recorded in the state store
> --
>
> Key: YARN-7703
> URL: https://issues.apache.org/jira/browse/YARN-7703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: lujie
>
> While reviewing YARN-7663 I noticed that apps killed from the NEW state skip 
> storing anything to the RM state store.  That means upon restart and recovery 
> these apps will not be recovered, so they will simply disappear.  That could 
> be surprising for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7703) Apps killed from the NEW state are not recorded in the state store

2018-01-05 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312814#comment-16312814
 ] 

lujie edited comment on YARN-7703 at 1/5/18 10:08 AM:
--

I have a initial fix idea which need to be review:
While application receive KILL event at NEW state, current code use 
AppKilledTransition which ignores storing state. We can use 
{code:java}
new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED)
{code}
 to replace AppKilledTransition and the postState should be changed to 
FINAL_SAVING. FinalSavingTransition will tell StateStore to  perform store 
action. The stateStore will reply APP_UPDATE_SAVED  back to application.

In unit test TestRMAppTransitions#testAppNewKill, we only need add a line 
:assertAppState(RMAppState.FINAL_SAVING, application);  before 
sendAppUpdateSavedEvent

i would attach a patch after YARN-7663 fixed, and  this patch should fix 
another InvalidStateTransitionException(only mark it here).






was (Author: xiaoheipangzi):
I have a initial fix idea which need to be review:
While application receive KILL event at NEW state, current code use 
AppKilledTransition which ignores storing state. We can use 
{code:java}
new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED)
{code}
 to replace AppKilledTransition and the postState should be changed to 
FINAL_SAVING. FinalSavingTransition will tell StateStore to  perform store 
action. The stateStore will reply APP_UPDATE_SAVED  back to application.

In unit test TestRMAppTransitions#testAppNewKill, we only need add a line 
:assertAppState(RMAppState.FINAL_SAVING, application);

i would attach a patch after YARN-7663 fixed, and  this patch should fix 
another InvalidStateTransitionException(only mark it here).





> Apps killed from the NEW state are not recorded in the state store
> --
>
> Key: YARN-7703
> URL: https://issues.apache.org/jira/browse/YARN-7703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: lujie
>
> While reviewing YARN-7663 I noticed that apps killed from the NEW state skip 
> storing anything to the RM state store.  That means upon restart and recovery 
> these apps will not be recovered, so they will simply disappear.  That could 
> be surprising for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-05 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312700#comment-16312700
 ] 

lujie edited comment on YARN-7663 at 1/5/18 8:28 AM:
-

Hi:
I have moved the method that performs assert to new test just as [#Jason Lowe] 
suggest.
But I still feel uncertain about the TODO that exists in RMAppImpl handle foo 
when I add onInvalidStateTransition. Below is the code:

{code:java}
try {
/* keep the master in sync with the state machine */
this.stateMachine.doTransition(event.getType(), event);
  } catch (InvalidStateTransitionException e) {
LOG.error("App: " + appID
+ " can't handle this event at current state", e);
onInvalidStateTransition(event.getType(), oldState);
TODO fail the application on the failed transition
  }
{code}

The TODO already exists in system for a long time, if this TODO is meaningless, 
it should be deleted. If it is really needed to implement, I think the 
implementation can be placed in new added foo(onInvalidStateTransition).  


was (Author: xiaoheipangzi):
Hi:
I have moved the method that performs assert to new test just as [#Jason Lowe] 
suggest.
But I still feel uncertain about the TODO that exists in RMAppImpl handle foo 
when I add onInvalidStateTransition. Below is the code:

{code:java}
try {
/* keep the master in sync with the state machine */
this.stateMachine.doTransition(event.getType(), event);
  } catch (InvalidStateTransitionException e) {
LOG.error("App: " + appID
+ " can't handle this event at current state", e);
onInvalidStateTransition(event.getType(), oldState);
{color:red}/* TODO fail the application on the failed transition 
*/{color}
  }
{code}

The TODO already exists in system for a long time, if this TODO is meaningless, 
it should be deleted. If it is really needed to implement, I think the 
implementation can be placed in new added foo(onInvalidStateTransition).  

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, 
> YARN-7663_4.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-05 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312700#comment-16312700
 ] 

lujie edited comment on YARN-7663 at 1/5/18 8:28 AM:
-

Hi:
I have moved the method that performs assert to new test just as [#Jason Lowe] 
suggest.
But I still feel uncertain about the TODO that exists in RMAppImpl handle foo 
when I add onInvalidStateTransition. Below is the code:

{code:java}
try {
/* keep the master in sync with the state machine */
this.stateMachine.doTransition(event.getType(), event);
  } catch (InvalidStateTransitionException e) {
LOG.error("App: " + appID
+ " can't handle this event at current state", e);
onInvalidStateTransition(event.getType(), oldState);
/* TODO fail the application on the failed transition*/
  }
{code}

The TODO already exists in system for a long time, if this TODO is meaningless, 
it should be deleted. If it is really needed to implement, I think the 
implementation can be placed in new added foo(onInvalidStateTransition).  


was (Author: xiaoheipangzi):
Hi:
I have moved the method that performs assert to new test just as [#Jason Lowe] 
suggest.
But I still feel uncertain about the TODO that exists in RMAppImpl handle foo 
when I add onInvalidStateTransition. Below is the code:

{code:java}
try {
/* keep the master in sync with the state machine */
this.stateMachine.doTransition(event.getType(), event);
  } catch (InvalidStateTransitionException e) {
LOG.error("App: " + appID
+ " can't handle this event at current state", e);
onInvalidStateTransition(event.getType(), oldState);
TODO fail the application on the failed transition
  }
{code}

The TODO already exists in system for a long time, if this TODO is meaningless, 
it should be deleted. If it is really needed to implement, I think the 
implementation can be placed in new added foo(onInvalidStateTransition).  

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, 
> YARN-7663_4.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7703) Apps killed from the NEW state are not recorded in the state store

2018-01-05 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie reassigned YARN-7703:
---

Assignee: lujie

> Apps killed from the NEW state are not recorded in the state store
> --
>
> Key: YARN-7703
> URL: https://issues.apache.org/jira/browse/YARN-7703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: lujie
>
> While reviewing YARN-7663 I noticed that apps killed from the NEW state skip 
> storing anything to the RM state store.  That means upon restart and recovery 
> these apps will not be recovered, so they will simply disappear.  That could 
> be surprising for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7703) Apps killed from the NEW state are not recorded in the state store

2018-01-05 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie reassigned YARN-7703:
---

Assignee: (was: lujie)

> Apps killed from the NEW state are not recorded in the state store
> --
>
> Key: YARN-7703
> URL: https://issues.apache.org/jira/browse/YARN-7703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jason Lowe
>
> While reviewing YARN-7663 I noticed that apps killed from the NEW state skip 
> storing anything to the RM state store.  That means upon restart and recovery 
> these apps will not be recovered, so they will simply disappear.  That could 
> be surprising for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-05 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312700#comment-16312700
 ] 

lujie edited comment on YARN-7663 at 1/5/18 8:49 AM:
-

Hi:
I have moved the method that performs assert to new test just as [#Jason Lowe] 
suggest.
But I still feel uncertain about the TODO that exists in RMAppImpl handle foo 
when I add onInvalidStateTransition. Below is the code:

{code:java}
try {
/* keep the master in sync with the state machine */
this.stateMachine.doTransition(event.getType(), event);
  } catch (InvalidStateTransitionException e) {
LOG.error("App: " + appID
+ " can't handle this event at current state", e);
onInvalidStateTransition(event.getType(), oldState);
/* TODO fail the application on the failed transition*/
  }
{code}

The TODO already exists in system for a long long time, if this TODO is 
meaningless, it should be deleted. If it is really needed to implement, I think 
the implementation can be placed in new added foo(onInvalidStateTransition).  


was (Author: xiaoheipangzi):
Hi:
I have moved the method that performs assert to new test just as [#Jason Lowe] 
suggest.
But I still feel uncertain about the TODO that exists in RMAppImpl handle foo 
when I add onInvalidStateTransition. Below is the code:

{code:java}
try {
/* keep the master in sync with the state machine */
this.stateMachine.doTransition(event.getType(), event);
  } catch (InvalidStateTransitionException e) {
LOG.error("App: " + appID
+ " can't handle this event at current state", e);
onInvalidStateTransition(event.getType(), oldState);
/* TODO fail the application on the failed transition*/
  }
{code}

The TODO already exists in system for a long time, if this TODO is meaningless, 
it should be deleted. If it is really needed to implement, I think the 
implementation can be placed in new added foo(onInvalidStateTransition).  

> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, 
> YARN-7663_4.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7703) Apps killed from the NEW state are not recorded in the state store

2018-01-05 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie reassigned YARN-7703:
---

Assignee: lujie

> Apps killed from the NEW state are not recorded in the state store
> --
>
> Key: YARN-7703
> URL: https://issues.apache.org/jira/browse/YARN-7703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: lujie
>
> While reviewing YARN-7663 I noticed that apps killed from the NEW state skip 
> storing anything to the RM state store.  That means upon restart and recovery 
> these apps will not be recovered, so they will simply disappear.  That could 
> be surprising for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7703) Apps killed from the NEW state are not recorded in the state store

2018-01-05 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312814#comment-16312814
 ] 

lujie commented on YARN-7703:
-

I have a initial fix idea which need to be review:
While application receive KILL event at NEW state, current code use 
AppKilledTransition which ignores storing state. We can use 
{code:java}
new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED)
{code}
 to replace AppKilledTransition and the postState should be changed to 
FINAL_SAVING. FinalSavingTransition will tell StateStore to  perform store 
action. The stateStore will reply APP_UPDATE_SAVED  back to application.

In unit test TestRMAppTransitions#testAppNewKill, we only need add a line 
:assertAppState(RMAppState.FINAL_SAVING, application);

i would attach a patch after YARN-7663 fixed, and  this patch should fix 
another InvalidStateTransitionException(only mark it here).





> Apps killed from the NEW state are not recorded in the state store
> --
>
> Key: YARN-7703
> URL: https://issues.apache.org/jira/browse/YARN-7703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: lujie
>
> While reviewing YARN-7663 I noticed that apps killed from the NEW state skip 
> storing anything to the RM state store.  That means upon restart and recovery 
> these apps will not be recovered, so they will simply disappear.  That could 
> be surprising for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7703) Apps killed from the NEW state are not recorded in the state store

2018-01-05 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312814#comment-16312814
 ] 

lujie edited comment on YARN-7703 at 1/5/18 10:12 AM:
--

I have a initial fix idea which need to be review:
While application receive KILL event at NEW state, current code use 
AppKilledTransition which ignores storing state. We can use 
{code:java}
new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED)
{code}
 to replace AppKilledTransition and the postState should be changed to 
FINAL_SAVING. FinalSavingTransition will tell StateStore to  perform store 
action. The stateStore will reply APP_UPDATE_SAVED  back to applicatio and 
finally state is KILLED.

In unit test TestRMAppTransitions#testAppNewKill, we only need add a line 
{color:#d04437}assertAppState(RMAppState.FINAL_SAVING, application);{color}  
just before perform sendAppUpdateSavedEvent

i would attach a patch after YARN-7663 fixed, and  this patch should fix 
another InvalidStateTransitionException(only mark it here).






was (Author: xiaoheipangzi):
I have a initial fix idea which need to be review:
While application receive KILL event at NEW state, current code use 
AppKilledTransition which ignores storing state. We can use 
{code:java}
new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED)
{code}
 to replace AppKilledTransition and the postState should be changed to 
FINAL_SAVING. FinalSavingTransition will tell StateStore to  perform store 
action. The stateStore will reply APP_UPDATE_SAVED  back to application.

In unit test TestRMAppTransitions#testAppNewKill, we only need add a line 
{color:#d04437}assertAppState(RMAppState.FINAL_SAVING, application);{color}  
just before perform sendAppUpdateSavedEvent

i would attach a patch after YARN-7663 fixed, and  this patch should fix 
another InvalidStateTransitionException(only mark it here).





> Apps killed from the NEW state are not recorded in the state store
> --
>
> Key: YARN-7703
> URL: https://issues.apache.org/jira/browse/YARN-7703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: lujie
>
> While reviewing YARN-7663 I noticed that apps killed from the NEW state skip 
> storing anything to the RM state store.  That means upon restart and recovery 
> these apps will not be recovered, so they will simply disappear.  That could 
> be surprising for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7703) Apps killed from the NEW state are not recorded in the state store

2018-01-05 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312814#comment-16312814
 ] 

lujie edited comment on YARN-7703 at 1/5/18 10:12 AM:
--

I have a initial fix idea which need to be review:
While application receive KILL event at NEW state, current code use 
AppKilledTransition which ignores storing state. We can use 
{code:java}
new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED)
{code}
 to replace AppKilledTransition and the postState should be changed to 
FINAL_SAVING. FinalSavingTransition will tell StateStore to  perform store 
action. The stateStore will reply APP_UPDATE_SAVED  back to applicatio and 
finally state will change to KILLED.

In unit test TestRMAppTransitions#testAppNewKill, we only need add a line 
{color:#d04437}assertAppState(RMAppState.FINAL_SAVING, application);{color}  
just before perform sendAppUpdateSavedEvent

i would attach a patch after YARN-7663 fixed, and  this patch should fix 
another InvalidStateTransitionException(only mark it here).






was (Author: xiaoheipangzi):
I have a initial fix idea which need to be review:
While application receive KILL event at NEW state, current code use 
AppKilledTransition which ignores storing state. We can use 
{code:java}
new FinalSavingTransition(new AppKilledTransition(), RMAppState.KILLED)
{code}
 to replace AppKilledTransition and the postState should be changed to 
FINAL_SAVING. FinalSavingTransition will tell StateStore to  perform store 
action. The stateStore will reply APP_UPDATE_SAVED  back to applicatio and 
finally state is KILLED.

In unit test TestRMAppTransitions#testAppNewKill, we only need add a line 
{color:#d04437}assertAppState(RMAppState.FINAL_SAVING, application);{color}  
just before perform sendAppUpdateSavedEvent

i would attach a patch after YARN-7663 fixed, and  this patch should fix 
another InvalidStateTransitionException(only mark it here).





> Apps killed from the NEW state are not recorded in the state store
> --
>
> Key: YARN-7703
> URL: https://issues.apache.org/jira/browse/YARN-7703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Jason Lowe
>Assignee: lujie
>
> While reviewing YARN-7663 I noticed that apps killed from the NEW state skip 
> storing anything to the RM state store.  That means upon restart and recovery 
> these apps will not be recovered, so they will simply disappear.  That could 
> be surprising for users.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-04 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310909#comment-16310909
 ] 

lujie edited comment on YARN-7663 at 1/4/18 9:01 AM:
-

After reading Jason Lowe useful suggestion. I rewrite the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
method:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. is it should  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as 
[YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]



was (Author: xiaoheipangzi):
After reading Jason Lowe useful suggestion. I rewrite the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
methodon:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. is it should  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as 
[YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]


> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-04 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310909#comment-16310909
 ] 

lujie edited comment on YARN-7663 at 1/4/18 8:08 AM:
-

After reading Jason Lowe useful suggestion. I rewrite the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
methodon:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. is it should  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as 
[YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]



was (Author: xiaoheipangzi):
After reading Jason Lowe useful suggestion. I change the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
methodon:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. is it should  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as 
[YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]


> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310909#comment-16310909
 ] 

lujie edited comment on YARN-7663 at 1/4/18 7:39 AM:
-

After reading Jason Lowe useful suggestion. I change the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
methodon:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. is it should  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as 
[YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]



was (Author: xiaoheipangzi):
After reading Jason Lowe useful suggestion. I change the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
methodon:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. is it should  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as [link 
YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]


> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310909#comment-16310909
 ] 

lujie edited comment on YARN-7663 at 1/4/18 7:39 AM:
-

After reading Jason Lowe useful suggestion. I change the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
methodon:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. is it should  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as [link 
YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]



was (Author: xiaoheipangzi):
After reading Jason Lowe useful suggestion. I change the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
methodon:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. ishould  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as [link 
YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]


> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-03 Thread lujie (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated YARN-7663:

Attachment: YARN-7663_3.patch

After reading Jason Lowe useful suggestion. I change the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
methodon:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. ishould  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as [link 
YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]


> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7663) RMAppImpl:Invalid event: START at KILLED

2018-01-09 Thread lujie (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310909#comment-16310909
 ] 

lujie edited comment on YARN-7663 at 1/10/18 3:03 AM:
--

After reading Jason Lowe useful suggestion. I rewrite the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
method:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state ACCEPTED 
2.testAppRunningFailed:,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. is it should  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as 
[YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]



was (Author: xiaoheipangzi):
After reading Jason Lowe useful suggestion. I rewrite the unit test and attach 
the new patch .
In this patch ,I do three  three things: 1. add empty protected 
method:onInvalidStateTransition, and add its callsite in the code block that 
RMAppImpl handle InvalidStateTransition2.create a new final class 
RMAppImplForTest which override onInvalidStateTransition.In createNewTestApp, 
create RMAppImplForTest  object instead of  RMAppImpl. 3. fix this bug by 
ignore the event

But there are another two InvalidStateTransition while 
testing:1.testAppAcceptedFailed:APP_ACCEPTED at state 
2.testAppRunningFailed:ACCEPTED,APP_UPDATE_SAVED at state KILLED.
These two InvalidStateTransition maybe bugs in RMAppimpl, or may be bugs in 
TestRMAppTransitions. is it should  be better to defer that to another JIRA?

Or can we just ignore the event without test, just as 
[YARN-4598|https://issues.apache.org/jira/browse/YARN-4598]


> RMAppImpl:Invalid event: START at KILLED
> 
>
> Key: YARN-7663
> URL: https://issues.apache.org/jira/browse/YARN-7663
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: lujie
>Assignee: lujie
>Priority: Minor
>  Labels: patch
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, 
> YARN-7663_4.patch, YARN-7663_5.patch, YARN-7663_6.patch, YARN-7663_7.patch
>
>
> Send kill to application, the RM log shows:
> {code:java}
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> START at KILLED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> if insert sleep before where the START event was created, this bug will 
> deterministically reproduce. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7726) RMAppImpl: can't handle APP_ACCEPTED at state ACCEPTED

2018-01-09 Thread lujie (JIRA)
lujie created YARN-7726:
---

 Summary: RMAppImpl: can't handle APP_ACCEPTED at state ACCEPTED
 Key: YARN-7726
 URL: https://issues.apache.org/jira/browse/YARN-7726
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.8.3
Reporter: lujie
Priority: Minor


while adding  patch  to TestRMAppTransitions, the patch triggers error message: 
  "can't handle APP_ACCEPTED at state ACCEPTED" in unit test  
testAppAcceptedFailed and testAppRunningFailed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   4   5   >