how to enable "sync on block close" in HDFS?
--Send from my Sony mobile. On Jul 2, 2013 6:47 AM, "Lars Hofhansl" <[email protected]> wrote: > HBase is interesting here, because it rewrites old data into new files. So > a power outage by default would not just lose new data but potentially old > data as well. > You can enable "sync on block close" in HDFS, and then at least be sure > that closed blocks (and thus files) are synced to disk physically. > I found that if that is paired with the "sync behind write" fadvice hint > there performance impact is minimal. > > -- Lars > > Dave Latham <[email protected]> wrote: > > >Thanks for the response, Suresh. > > > >I'm not sure that I understand the details properly. From my reading of > >HDFS-744 the hsync API would allow a client to make sure that at any point > >in time it's writes so far hit the disk. For example, for HBase it could > >apply a fsync after adding some edits to its WAL to ensure those edits are > >fully durable for a file which is still open. > > > >However, in this case the dfs file was closed and even renamed. Is it the > >case that even after a dfs file is closed and renamed that the data blocks > >would still not be synced and would still be stored by the datanode in > >"blocksBeingWritten" rather than in "current"? If that is case, would it > >be better for the NameNode not to reject replicas that are in > >blocksBeingWritten, especially if it doesn't have any other replicas > >available? > > > >Dave > > > > > >On Mon, Jul 1, 2013 at 3:16 PM, Suresh Srinivas <[email protected] > >wrote: > > > >> Yes this is a known issue. > >> > >> The HDFS part of this was addressed in > >> https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is > not > >> available in 1.x release. I think HBase does not use this API yet. > >> > >> > >> On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham <[email protected]> > wrote: > >> > >> > We're running HBase over HDFS 1.0.2 on about 1000 nodes. On Saturday > the > >> > data center we were in had a total power failure and the cluster went > >> down > >> > hard. When we brought it back up, HDFS reported 4 files as CORRUPT. > We > >> > recovered the data in question from our secondary datacenter, but I'm > >> > trying to understand what happened and whether this is a bug in HDFS > that > >> > should be fixed. > >> > > >> > From what I can tell the file was created and closed by the dfs client > >> > (hbase). Then HBase renamed it into a new directory and deleted some > >> other > >> > files containing the same data. Then the cluster lost power. After > the > >> > cluster was restarted, the datanodes reported into the namenode but > the > >> > blocks for this file appeared as "blocks being written" - the namenode > >> > rejected them and the datanodes deleted the blocks. At this point > there > >> > were no replicas for the blocks and the files were marked CORRUPT. > The > >> > underlying file systems are ext3. Some questions that I would love > get > >> > answers for if anyone with deeper understanding of HDFS can chime in: > >> > > >> > - Is this a known scenario where data loss is expected? (I found > >> > HDFS-1539 but that seems different) > >> > - When are blocks moved from blocksBeingWritten to current? Does > that > >> > happen before a file close operation is acknowledged to a hdfs client? > >> > - Could it be that the DataNodes actually moved the blocks to current > >> but > >> > after the restart ext3 rewound state somehow (forgive my ignorance of > >> > underlying file system behavior)? > >> > - Is there any other explanation for how this can happen? > >> > > >> > Here is a sequence of selected relevant log lines from the RS (HBase > >> > Region Server) NN (NameNode) and DN (DataNode - 1 example of 3 in > >> > question). It includes everything that mentions the block in > question in > >> > the NameNode and one DataNode log. Please let me know if this more > >> > information that would be helpful. > >> > > >> > RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: > >> > Creating > >> > > >> > file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > >> > with permission=rwxrwxrwx > >> > NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: > >> BLOCK* > >> > NameSystem.allocateBlock: > >> > > >> > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c. > >> > blk_1395839728632046111_357084589 > >> > DN 2013-06-29 11:16:06,832 INFO > >> > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block > >> > blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: / > >> > 10.0.5.237:50010 > >> > NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: > >> BLOCK* > >> > NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added > to > >> > blk_1395839728632046111_357084589 size 25418340 > >> > NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: > >> BLOCK* > >> > NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is > added to > >> > blk_1395839728632046111_357084589 size 25418340 > >> > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: > >> BLOCK* > >> > NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is > added > >> to > >> > blk_1395839728632046111_357084589 size 25418340 > >> > DN 2013-06-29 11:16:11,385 INFO > >> > org.apache.hadoop.hdfs.server.datanode.DataNode: Received block > >> > blk_1395839728632046111_357084589 of size 25418340 from / > >> 10.0.5.237:14327 > >> > DN 2013-06-29 11:16:11,385 INFO > >> > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for > >> > block blk_1395839728632046111_357084589 terminating > >> > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: > >> > Removing lease on file > >> > > >> > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > >> > from client DFSClient_hb_rs_hs745,60020,1372470111932 > >> > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: > DIR* > >> > NameSystem.completeFile: file > >> > > >> > /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > >> > is closed by DFSClient_hb_rs_hs745,60020,1372470111932 > >> > RS 2013-06-29 11:16:11,393 INFO > >> > org.apache.hadoop.hbase.regionserver.Store: Renaming compacted file at > >> > > >> > hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c > >> > to > >> > > >> > hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c > >> > RS 2013-06-29 11:16:11,505 INFO > >> > org.apache.hadoop.hbase.regionserver.Store: Completed major compaction > >> of 7 > >> > file(s) in n of > >> > users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. > >> into > >> > 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is > >> 24.2m > >> > > >> > ------- CRASH, RESTART --------- > >> > > >> > NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: > >> BLOCK* > >> > NameSystem.addStoredBlock: addStoredBlock request received for > >> > blk_1395839728632046111_357084589 on 10.0.6.1:50010 size 21978112 but > >> was > >> > rejected: Reported as block being written but is a block of closed > file. > >> > NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: > >> BLOCK* > >> > NameSystem.addToInvalidates: blk_1395839728632046111 is added to > >> invalidSet > >> > of 10.0.6.1:50010 > >> > NN 2013-06-29 12:01:20,155 INFO org.apache.hadoop.hdfs.StateChange: > >> BLOCK* > >> > NameSystem.addStoredBlock: addStoredBlock request received for > >> > blk_1395839728632046111_357084589 on 10.0.5.237:50010 size 16971264 > but > >> > was rejected: Reported as block being written but is a block of closed > >> file. > >> > NN 2013-06-29 12:01:20,155 INFO org.apache.hadoop.hdfs.StateChange: > >> BLOCK* > >> > NameSystem.addToInvalidates: blk_1395839728632046111 is added to > >> invalidSet > >> > of 10.0.5.237:50010 > >> > NN 2013-06-29 12:01:20,175 INFO org.apache.hadoop.hdfs.StateChange: > >> BLOCK* > >> > NameSystem.addStoredBlock: addStoredBlock request received for > >> > blk_1395839728632046111_357084589 on 10.0.6.24:50010 size 21913088 > but > >> > was rejected: Reported as block being written but is a block of closed > >> file. > >> > NN 2013-06-29 12:01:20,175 INFO org.apache.hadoop.hdfs.StateChange: > >> BLOCK* > >> > NameSystem.addToInvalidates: blk_1395839728632046111 is added to > >> invalidSet > >> > of 10.0.6.24:50010 > >> > (note the clock on the server running DN is wrong after restart. I > >> > believe timestamps are off by 6 hours:) > >> > DN 2013-06-29 06:07:22,877 INFO > >> > org.apache.hadoop.hdfs.server.datanode.DataNode: Scheduling block > >> > blk_1395839728632046111_357084589 file > >> > /data/hadoop/dfs/data/blocksBeingWritten/blk_1395839728632046111 for > >> > deletion > >> > DN 2013-06-29 06:07:24,952 INFO > >> > org.apache.hadoop.hdfs.server.datanode.DataNode: Deleted block > >> > blk_1395839728632046111_357084589 at file > >> > /data/hadoop/dfs/data/blocksBeingWritten/blk_1395839728632046111 > >> > > >> > > >> > Thanks, > >> > Dave > >> > > >> > >> > >> > >> -- > >> http://hortonworks.com/download/ > >> >
