"sync" is a fluffy term in HDFS. HDFS has hsync and hflush.
hflush forces all current changes at a DFSClient to all replica nodes (but not 
to disk).

Until HDFS-744 hsync would be identical to hflush. After HDFS-744 hsync can be 
used to force data to disk at the replicas.


When HBase refers to "sync" the hflush semantics are meant (at least until 
HBASE-5954 is finished).
I.e. a sync here ensures that the replica nodes have seen the changes, which is 
what you want.


So when you say "since another copy is always there on the replica nodes", that 
is only guaranteed after an hflush (again, which HBase calls sync).


I've also written about this here: 
http://hadoop-hbase.blogspot.com/2012/05/hbase-hdfs-and-durable-sync.html

-- Lars



________________________________
 From: Mohit Anchlia <[email protected]>
To: [email protected] 
Sent: Tuesday, July 31, 2012 6:09 PM
Subject: sync on writes
 
In the HBase book it mentioned that the default behaviour of write is to
call sync on each node before sending replica copies to the nodes in the
pipeline. Is there a reason this was kept default because if data is
getting written on multiple nodes then likelyhood of losing data is really
low since another copy is always there on the replica nodes. Is it ok to
make this sync async and is it advisable?

Reply via email to