[ 
https://issues.apache.org/jira/browse/HDFS-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thanh Do updated HDFS-1221:
---------------------------

    Description: 
- Summary: 
If a crash happens during FSEditLog.createEditLogFile(), the
edits log file on disk may be stale. During next reboot, NameNode 
will get an exception when parsing the edits file, because of stale data, 
leading to unsuccessful reboot.
Note: This is just one example. Since we see that edits log (and fsimage)
does not have checksum, they are vulnerable to corruption too.
 
- Details:
The steps to create new edits log (which we infer from HDFS code) are:
1) truncate the file to zero size
2) write FSConstants.LAYOUT_VERSION to buffer
3) insert the end-of-file marker OP_INVALID to the end of the buffer
4) preallocate 1MB of data, and fill the data with 0
5) flush the buffer to disk
 
Note that only in step 1, 4, 5, the data on disk is actually changed.
Now, suppose a crash happens after step 4, but before step 5.
In the next reboot, NameNode will fetch this edits log file (which contains
all 0). The first thing parsed is the LAYOUT_VERSION, which is 0. This is OK,
because NameNode has code to handle that case.
(but we expect LAYOUT_VERSION to be -18, don't we). 
Now it parses the operation code, which happens to be 0. Unfortunately, since 0
is the value for OP_ADD, the NameNode expects some parameters corresponding 
to that operation. Now NameNode calls readString to read the path, which throws
an exception leading to a failed reboot.

This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

  was:
- Summary: 
If a crash happens during FSEditLog.createEditLogFile(), the
edits log file on disk may be stale. During next reboot, NameNode 
will get an exception when parsing the edits file, because of stale data, 
leading to unsuccessful reboot.
Note: This is just one example. Since we see that edits log (and fsimage)
does not have checksum, they are vulnerable to corruption too.
 
- Details:
The steps to create new edits log (which we infer from HDFS code) are:
1) truncate the file to zero size
2) write FSConstants.LAYOUT_VERSION to buffer
3) insert the end-of-file marker OP_INVALID to the end of the buffer
4) preallocate 1MB of data, and fill the data with 0
5) flush the buffer to disk
 
Note that only in step 1, 4, 5, the data on disk is actually changed.
Now, suppose a crash happens after step 4, but before step 5.
In the next reboot, NameNode will fetch this edits log file (which contains
all 0). The first thing parsed is the LAYOUT_VERSION, which is 0. This is OK,
because NameNode has code to handle that case.
(but we expect LAYOUT_VERSION to be -18, don't we). 
Now it parses the operation code, which happens to be 0. Unfortunately, since 0
is the value for OP_ADD, the NameNode expects some parameters corresponding 
to that operation. Now NameNode calls readString to read the path, which throws
an exception leading to a failed reboot.

We found this problem almost at the same time as HDFS developers.
Basically, the edits log is truncated before fsimage.ckpt is renamed to fsimage.
Hence, any crash happens after the truncation but before the renaming will lead
to a data loss. Detailed description can be found here:
https://issues.apache.org/jira/browse/HDFS-955
This bug was found by our Failure Testing Service framework:
http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
Haryadi Gunawi (hary...@eecs.berkeley.edu)

    Component/s: name-node

> NameNode unable to start due to stale edits log after a crash
> -------------------------------------------------------------
>
>                 Key: HDFS-1221
>                 URL: https://issues.apache.org/jira/browse/HDFS-1221
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.1
>            Reporter: Thanh Do
>
> - Summary: 
> If a crash happens during FSEditLog.createEditLogFile(), the
> edits log file on disk may be stale. During next reboot, NameNode 
> will get an exception when parsing the edits file, because of stale data, 
> leading to unsuccessful reboot.
> Note: This is just one example. Since we see that edits log (and fsimage)
> does not have checksum, they are vulnerable to corruption too.
>  
> - Details:
> The steps to create new edits log (which we infer from HDFS code) are:
> 1) truncate the file to zero size
> 2) write FSConstants.LAYOUT_VERSION to buffer
> 3) insert the end-of-file marker OP_INVALID to the end of the buffer
> 4) preallocate 1MB of data, and fill the data with 0
> 5) flush the buffer to disk
>  
> Note that only in step 1, 4, 5, the data on disk is actually changed.
> Now, suppose a crash happens after step 4, but before step 5.
> In the next reboot, NameNode will fetch this edits log file (which contains
> all 0). The first thing parsed is the LAYOUT_VERSION, which is 0. This is OK,
> because NameNode has code to handle that case.
> (but we expect LAYOUT_VERSION to be -18, don't we). 
> Now it parses the operation code, which happens to be 0. Unfortunately, since > 0
> is the value for OP_ADD, the NameNode expects some parameters corresponding 
> to that operation. Now NameNode calls readString to read the path, which 
> throws
> an exception leading to a failed reboot.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to