Yiqun Lin created HDDS-3180:
-------------------------------

             Summary: Datanode shutdown due to inconsistent volume state 
without helpful error message
                 Key: HDDS-3180
                 URL: https://issues.apache.org/jira/browse/HDDS-3180
             Project: Hadoop Distributed Data Store
          Issue Type: Improvement
    Affects Versions: 0.4.1
            Reporter: Yiqun Lin
            Assignee: Yiqun Lin


I meet an error in my testing ozone cluster when I restart datanode. From the 
log, it throws inconsistent volume state but without other detailed helpful 
info:
{noformat}
2020-03-14 02:31:46,204 [main] INFO  (LogAdapter.java:51)     - registered UNIX 
signal handlers for [TERM, HUP, INT]
2020-03-14 02:31:46,736 [main] INFO  (HddsDatanodeService.java:204)     - 
HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx
2020-03-14 02:31:46,784 [main] INFO  (HddsVolume.java:177)     - Creating 
Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : 
20063645696
2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202)     - Failed 
to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data
java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading 
volume: /tmp/hadoop-hdfs/dfs/data/hdds
        at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226)
        at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume.<init>(HddsVolume.java:180)
        at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume.<init>(HddsVolume.java:71)
        at 
org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158)
        at 
org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336)
        at 
org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183)
        at 
org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.<init>(MutableVolumeSet.java:139)
        at 
org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.<init>(MutableVolumeSet.java:111)
        at 
org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.<init>(OzoneContainer.java:97)
        at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.<init>(DatanodeStateMachine.java:128)
        at 
org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235)
        at 
org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179)
        at 
org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154)
        at 
org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78)
        at picocli.CommandLine.execute(CommandLine.java:1173)
        at picocli.CommandLine.access$800(CommandLine.java:141)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:1367)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:1335)
        at 
picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243)
        at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526)
        at picocli.CommandLine.parseWithHandler(CommandLine.java:1465)
        at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65)
        at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56)
        at 
org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137)
2020-03-14 02:31:46,795 [shutdown-hook-0] INFO  (LogAdapter.java:51)     - 
SHUTDOWN_MSG:
{noformat}

Then I look into the code and the root cause is that the version file was lost 
in that node.
We need to log key message as well to help user quickly know the root cause of 
this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to