[jira] Updated: (HDFS-1220) Namenode unable to start due to truncated fstime
[ https://issues.apache.org/jira/browse/HDFS-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thanh Do updated HDFS-1220: --- Description: - Summary: updating fstime file on disk is not atomic, so it is possible that if a crash happens in the middle, next time when NameNode reboots, it will read stale fstime, hence unable to start successfully. - Details: Basically, this involve 3 steps: 1) delete fstime file (timeFile.delete()) 2) truncate fstime file (new FileOutputStream(timeFile)) 3) write new time to fstime file (out.writeLong(checkpointTime)) If a crash happens after step 2 and before step 3, in the next reboot, NameNode got an exception when reading the time (8 byte) from an empty fstime file. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu was: - Summary: updating fstime file on disk is not atomic, so it is possible that if a crash happens in the middle, next time when NameNode reboots, it will read stale fstime, hence unable to start successfully. - Details: Below is the code for updating fstime file on disk void writeCheckpointTime(StorageDirectory sd) throws IOException { if (checkpointTime 0L) return; // do not write negative time File timeFile = getImageFile(sd, NameNodeFile.TIME); if (timeFile.exists()) { timeFile.delete(); } DataOutputStream out = new DataOutputStream( new FileOutputStream(timeFile)); try { out.writeLong(checkpointTime); } finally { out.close(); } } Basically, this involve 3 steps: 1) delete fstime file (timeFile.delete()) 2) truncate fstime file (new FileOutputStream(timeFile)) 3) write new time to fstime file (out.writeLong(checkpointTime)) If a crash happens after step 2 and before step 3, in the next reboot, NameNode got an exception when reading the time (8 byte) from an empty fstime file. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu Namenode unable to start due to truncated fstime Key: HDFS-1220 URL: https://issues.apache.org/jira/browse/HDFS-1220 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: updating fstime file on disk is not atomic, so it is possible that if a crash happens in the middle, next time when NameNode reboots, it will read stale fstime, hence unable to start successfully. - Details: Basically, this involve 3 steps: 1) delete fstime file (timeFile.delete()) 2) truncate fstime file (new FileOutputStream(timeFile)) 3) write new time to fstime file (out.writeLong(checkpointTime)) If a crash happens after step 2 and before step 3, in the next reboot, NameNode got an exception when reading the time (8 byte) from an empty fstime file. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1220) Namenode unable to start due to truncated fstime
[ https://issues.apache.org/jira/browse/HDFS-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thanh Do updated HDFS-1220: --- Description: - Summary: updating fstime file on disk is not atomic, so it is possible that if a crash happens in the middle, next time when NameNode reboots, it will read stale fstime, hence unable to start successfully. - Details: Basically, this involve 3 steps: 1) delete fstime file (timeFile.delete()) 2) truncate fstime file (new FileOutputStream(timeFile)) 3) write new time to fstime file (out.writeLong(checkpointTime)) If a crash happens after step 2 and before step 3, in the next reboot, NameNode got an exception when reading the time (8 byte) from an empty fstime file. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu was: - Summary: updating fstime file on disk is not atomic, so it is possible that if a crash happens in the middle, next time when NameNode reboots, it will read stale fstime, hence unable to start successfully. - Details: Below is the code for updating fstime file on disk void writeCheckpointTime(StorageDirectory sd) throws IOException { if (checkpointTime 0L) return; // do not write negative time File timeFile = getImageFile(sd, NameNodeFile.TIME); if (timeFile.exists()) { timeFile.delete(); } DataOutputStream out = new DataOutputStream( new FileOutputStream(timeFile)); try { out.writeLong(checkpointTime); } finally { out.close(); } } Basically, this involve 3 steps: 1) delete fstime file (timeFile.delete()) 2) truncate fstime file (new FileOutputStream(timeFile)) 3) write new time to fstime file (out.writeLong(checkpointTime)) If a crash happens after step 2 and before step 3, in the next reboot, NameNode got an exception when reading the time (8 byte) from an empty fstime file. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu Namenode unable to start due to truncated fstime Key: HDFS-1220 URL: https://issues.apache.org/jira/browse/HDFS-1220 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: updating fstime file on disk is not atomic, so it is possible that if a crash happens in the middle, next time when NameNode reboots, it will read stale fstime, hence unable to start successfully. - Details: Basically, this involve 3 steps: 1) delete fstime file (timeFile.delete()) 2) truncate fstime file (new FileOutputStream(timeFile)) 3) write new time to fstime file (out.writeLong(checkpointTime)) If a crash happens after step 2 and before step 3, in the next reboot, NameNode got an exception when reading the time (8 byte) from an empty fstime file. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1220) Namenode unable to start due to truncated fstime
[ https://issues.apache.org/jira/browse/HDFS-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thanh Do updated HDFS-1220: --- Description: - Summary: updating fstime file on disk is not atomic, so it is possible that if a crash happens in the middle, next time when NameNode reboots, it will read stale fstime, hence unable to start successfully. - Details: Below is the code for updating fstime file on disk void writeCheckpointTime(StorageDirectory sd) throws IOException { if (checkpointTime 0L) return; // do not write negative time File timeFile = getImageFile(sd, NameNodeFile.TIME); if (timeFile.exists()) { timeFile.delete(); } DataOutputStream out = new DataOutputStream( new FileOutputStream(timeFile)); try { out.writeLong(checkpointTime); } finally { out.close(); } } Basically, this involve 3 steps: 1) delete fstime file (timeFile.delete()) 2) truncate fstime file (new FileOutputStream(timeFile)) 3) write new time to fstime file (out.writeLong(checkpointTime)) If a crash happens after step 2 and before step 3, in the next reboot, NameNode got an exception when reading the time (8 byte) from an empty fstime file. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu was: - Summary: updating fstime file on disk is not atomic, so it is possible that if a crash happens in the middle, next time when NameNode reboots, it will read stale fstime, hence unable to start successfully. - Details: Basically, this involve 3 steps: 1) delete fstime file (timeFile.delete()) 2) truncate fstime file (new FileOutputStream(timeFile)) 3) write new time to fstime file (out.writeLong(checkpointTime)) If a crash happens after step 2 and before step 3, in the next reboot, NameNode got an exception when reading the time (8 byte) from an empty fstime file. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu Namenode unable to start due to truncated fstime Key: HDFS-1220 URL: https://issues.apache.org/jira/browse/HDFS-1220 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1 Reporter: Thanh Do - Summary: updating fstime file on disk is not atomic, so it is possible that if a crash happens in the middle, next time when NameNode reboots, it will read stale fstime, hence unable to start successfully. - Details: Below is the code for updating fstime file on disk void writeCheckpointTime(StorageDirectory sd) throws IOException { if (checkpointTime 0L) return; // do not write negative time File timeFile = getImageFile(sd, NameNodeFile.TIME); if (timeFile.exists()) { timeFile.delete(); } DataOutputStream out = new DataOutputStream( new FileOutputStream(timeFile)); try { out.writeLong(checkpointTime); } finally { out.close(); } } Basically, this involve 3 steps: 1) delete fstime file (timeFile.delete()) 2) truncate fstime file (new FileOutputStream(timeFile)) 3) write new time to fstime file (out.writeLong(checkpointTime)) If a crash happens after step 2 and before step 3, in the next reboot, NameNode got an exception when reading the time (8 byte) from an empty fstime file. This bug was found by our Failure Testing Service framework: http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html For questions, please email us: Thanh Do (than...@cs.wisc.edu) and Haryadi Gunawi (hary...@eecs.berkeley.edu -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.