Wei-Chiu Chuang created HDDS-4269: ------------------------------------- Summary: Ozone DataNode thinks a volume is failed if an unexpected file is in the HDDS root directory Key: HDDS-4269 URL: https://issues.apache.org/jira/browse/HDDS-4269 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Affects Versions: 1.1.0 Reporter: Wei-Chiu Chuang
Took me some time to debug a trivial bug. DataNode crashes after this mysterious error and no explanation: {noformat} 10:11:44.382 PM INFO MutableVolumeSet Moving Volume : /var/lib/hadoop-ozone/fake_datanode/data/hdds to failed Volumes 10:11:46.287 PM ERROR StateContext Critical error occurred in StateMachine, setting shutDownMachine 10:11:46.287 PM ERROR DatanodeStateMachine DatanodeStateMachine Shutdown due to an critical error {noformat} Turns out that if there are unexpected files under the hdds directory ($hdds.datanode.dir/hdds), DN thinks the volume is bad and move it to failed volume list, without an error explanation. I was editing the VERSION file and vim created a temp file under the directory. This is impossible to debug without reading the code. {code:java|title=HddsVolumeUtil#checkVolume()} } else if(hddsFiles.length == 2) { // The files should be Version and SCM directory if (scmDir.exists()) { return true; } else { logger.error("Volume {} is in Inconsistent state, expected scm " + "directory {} does not exist", volumeRoot, scmDir .getAbsolutePath()); return false; } } else { // The hdds root dir should always have 2 files. One is Version file // and other is SCM directory. <---- HERE! return false; } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org