[
https://issues.apache.org/jira/browse/HDDS-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056664#comment-17056664
]
Siddharth Wagle edited comment on HDDS-3104 at 3/11/20, 6:16 AM:
-----------------------------------------------------------------
[~adoroszlai] I think this happens because:
1. HddsDatanodeService#stop, calls DatanodeStateMachine#stopDaemon which sets
the StateContext.state = SHUTDOWN
2. The DatanodeStateMachine is calling StateContext.execute because it read the
stale state
3. StateContext.excute will set shutDownOnError to true when it sees the new
state even though there was no error
So, I have a proposed patch for this, attaching it here before making a PR,
would like to know your thoughts.
was (Author: swagle):
[~adoroszlai] I think this happens because:
1. HddsDatanodeService#stop, calls DatanodeStateMachine#stopDaemon which sets
the StateContext.state = SHUTDOWN
2. The DatanodeStateMachine is calling StateContext.execute because it read the
stale state of the state
3. StateContext.excute will set shutDownOnError to true when it sees the new
state even though there was no error
So, I have a proposed patch for this, attaching it here before making a PR,
would like to know your thoughts.
> Integration test crashes due to critical error in datanode
> ----------------------------------------------------------
>
> Key: HDDS-3104
> URL: https://issues.apache.org/jira/browse/HDDS-3104
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Reporter: Attila Doroszlai
> Assignee: Siddharth Wagle
> Priority: Major
> Labels: pull-request-available
> Attachments: HDDS-3104.patch,
> org.apache.hadoop.ozone.client.rpc.TestContainerStateMachineFailureOnRead-output.txt
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> {code:title=test log}
> 2020-02-28 07:36:17,759 [Datanode State Machine Thread - 0] ERROR
> statemachine.StateContext (StateContext.java:execute(420)) - Critical error
> occurred in StateMachine, setting shutDownMachine
> ...
> 2020-02-28 07:36:21,216 [Datanode State Machine Thread - 0] INFO
> util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1:
> ExitException
> {code}
> {code:title=build output}
> [ERROR] ExecutionException The forked VM terminated without properly saying
> goodbye. VM crash or System.exit called?
> {code}
> https://github.com/adoroszlai/hadoop-ozone/runs/474218807
> https://github.com/adoroszlai/hadoop-ozone/suites/487650271/artifacts/2327174
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]