[jira] Updated: (HDFS-1220) Namenode unable to start due to truncated fstime

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1220:
---

Description: 
- Summary: updating fstime file on disk is not atomic, so it is possible that
if a crash happens in the middle, next time when NameNode reboots, it will
read stale fstime, hence unable to start successfully.
 
- Details:
Basically, this involve 3 steps:
1) delete fstime file (timeFile.delete())
2) truncate fstime file (new FileOutputStream(timeFile))
3) write new time to fstime file (out.writeLong(checkpointTime))
If a crash happens after step 2 and before step 3, in the next reboot, NameNode
got an exception when reading the time (8 byte) from an empty fstime file.


This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu

  was:
- Summary: updating fstime file on disk is not atomic, so it is possible that
if a crash happens in the middle, next time when NameNode reboots, it will
read stale fstime, hence unable to start successfully.
 
- Details:
Below is the code for updating fstime file on disk
  void writeCheckpointTime(StorageDirectory sd) throws IOException {
if (checkpointTime  0L)
  return; // do not write negative time 

 
File timeFile = getImageFile(sd, NameNodeFile.TIME);
if (timeFile.exists()) { timeFile.delete(); }
DataOutputStream out = new DataOutputStream(
new FileOutputStream(timeFile));
try {
  out.writeLong(checkpointTime);
} finally {
  out.close();
}
  }
 
Basically, this involve 3 steps:
1) delete fstime file (timeFile.delete())
2) truncate fstime file (new FileOutputStream(timeFile))
3) write new time to fstime file (out.writeLong(checkpointTime))
If a crash happens after step 2 and before step 3, in the next reboot, NameNode
got an exception when reading the time (8 byte) from an empty fstime file.


This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu


 Namenode unable to start due to truncated fstime
 

 Key: HDFS-1220
 URL: https://issues.apache.org/jira/browse/HDFS-1220
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1
Reporter: Thanh Do

 - Summary: updating fstime file on disk is not atomic, so it is possible that
 if a crash happens in the middle, next time when NameNode reboots, it will
 read stale fstime, hence unable to start successfully.
  
 - Details:
 Basically, this involve 3 steps:
 1) delete fstime file (timeFile.delete())
 2) truncate fstime file (new FileOutputStream(timeFile))
 3) write new time to fstime file (out.writeLong(checkpointTime))
 If a crash happens after step 2 and before step 3, in the next reboot, 
 NameNode
 got an exception when reading the time (8 byte) from an empty fstime file.
 This bug was found by our Failure Testing Service framework:
 http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
 For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
 Haryadi Gunawi (hary...@eecs.berkeley.edu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1220) Namenode unable to start due to truncated fstime

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1220:
---

Description: 
- Summary: updating fstime file on disk is not atomic, so it is possible that
if a crash happens in the middle, next time when NameNode reboots, it will
read stale fstime, hence unable to start successfully.
 
- Details:
Basically, this involve 3 steps:
1) delete fstime file (timeFile.delete())
2) truncate fstime file (new FileOutputStream(timeFile))
3) write new time to fstime file (out.writeLong(checkpointTime))
If a crash happens after step 2 and before step 3, in the next reboot, NameNode
got an exception when reading the time (8 byte) from an empty fstime file.


This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu

  was:
- Summary: updating fstime file on disk is not atomic, so it is possible that
if a crash happens in the middle, next time when NameNode reboots, it will
read stale fstime, hence unable to start successfully.
 
- Details:
Below is the code for updating fstime file on disk
  void writeCheckpointTime(StorageDirectory sd) throws IOException {
if (checkpointTime  0L)
  return; // do not write negative time
File timeFile = getImageFile(sd, NameNodeFile.TIME);
if (timeFile.exists()) { timeFile.delete(); }
DataOutputStream out = new DataOutputStream(
new FileOutputStream(timeFile));
try {
  out.writeLong(checkpointTime);
} finally {
  out.close();
}
  }


Basically, this involve 3 steps:
1) delete fstime file (timeFile.delete())
2) truncate fstime file (new FileOutputStream(timeFile))
3) write new time to fstime file (out.writeLong(checkpointTime))
If a crash happens after step 2 and before step 3, in the next reboot, NameNode
got an exception when reading the time (8 byte) from an empty fstime file.


This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu


 Namenode unable to start due to truncated fstime
 

 Key: HDFS-1220
 URL: https://issues.apache.org/jira/browse/HDFS-1220
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1
Reporter: Thanh Do

 - Summary: updating fstime file on disk is not atomic, so it is possible that
 if a crash happens in the middle, next time when NameNode reboots, it will
 read stale fstime, hence unable to start successfully.
  
 - Details:
 Basically, this involve 3 steps:
 1) delete fstime file (timeFile.delete())
 2) truncate fstime file (new FileOutputStream(timeFile))
 3) write new time to fstime file (out.writeLong(checkpointTime))
 If a crash happens after step 2 and before step 3, in the next reboot, 
 NameNode
 got an exception when reading the time (8 byte) from an empty fstime file.
 This bug was found by our Failure Testing Service framework:
 http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
 For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
 Haryadi Gunawi (hary...@eecs.berkeley.edu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1220) Namenode unable to start due to truncated fstime

2010-06-16 Thread Thanh Do (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1220:
---

Description: 
- Summary: updating fstime file on disk is not atomic, so it is possible that
if a crash happens in the middle, next time when NameNode reboots, it will
read stale fstime, hence unable to start successfully.
 
- Details:
Below is the code for updating fstime file on disk
  void writeCheckpointTime(StorageDirectory sd) throws IOException {
if (checkpointTime  0L)
  return; // do not write negative time
File timeFile = getImageFile(sd, NameNodeFile.TIME);
if (timeFile.exists()) { timeFile.delete(); }
DataOutputStream out = new DataOutputStream(
new FileOutputStream(timeFile));
try {
  out.writeLong(checkpointTime);
} finally {
  out.close();
}
  }


Basically, this involve 3 steps:
1) delete fstime file (timeFile.delete())
2) truncate fstime file (new FileOutputStream(timeFile))
3) write new time to fstime file (out.writeLong(checkpointTime))
If a crash happens after step 2 and before step 3, in the next reboot, NameNode
got an exception when reading the time (8 byte) from an empty fstime file.


This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu

  was:
- Summary: updating fstime file on disk is not atomic, so it is possible that
if a crash happens in the middle, next time when NameNode reboots, it will
read stale fstime, hence unable to start successfully.
 
- Details:
Basically, this involve 3 steps:
1) delete fstime file (timeFile.delete())
2) truncate fstime file (new FileOutputStream(timeFile))
3) write new time to fstime file (out.writeLong(checkpointTime))
If a crash happens after step 2 and before step 3, in the next reboot, NameNode
got an exception when reading the time (8 byte) from an empty fstime file.


This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu


 Namenode unable to start due to truncated fstime
 

 Key: HDFS-1220
 URL: https://issues.apache.org/jira/browse/HDFS-1220
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.1
Reporter: Thanh Do

 - Summary: updating fstime file on disk is not atomic, so it is possible that
 if a crash happens in the middle, next time when NameNode reboots, it will
 read stale fstime, hence unable to start successfully.
  
 - Details:
 Below is the code for updating fstime file on disk
   void writeCheckpointTime(StorageDirectory sd) throws IOException {
 if (checkpointTime  0L)
   return; // do not write negative time
 File timeFile = getImageFile(sd, NameNodeFile.TIME);
 if (timeFile.exists()) { timeFile.delete(); }
 DataOutputStream out = new DataOutputStream(
 new 
 FileOutputStream(timeFile));
 try {
   out.writeLong(checkpointTime);
 } finally {
   out.close();
 }
   }
 Basically, this involve 3 steps:
 1) delete fstime file (timeFile.delete())
 2) truncate fstime file (new FileOutputStream(timeFile))
 3) write new time to fstime file (out.writeLong(checkpointTime))
 If a crash happens after step 2 and before step 3, in the next reboot, 
 NameNode
 got an exception when reading the time (8 byte) from an empty fstime file.
 This bug was found by our Failure Testing Service framework:
 http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
 For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
 Haryadi Gunawi (hary...@eecs.berkeley.edu

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.