I believe you are talking about enabling dfs.support.append feature? I benchmarked the difference (disable/enable) previously and I don't find much differences. It would be great if someone else can confirm on this.
Best Regards, Jerry On Wednesday, August 1, 2012, Alex Baranau wrote: > I believe that this is *not default*, but *current* implementation of > sync(). I.e. (please correct me if I'm wrong) n-way write approach is not > available yet. > You might confuse it with the fact that by default, sync() is called on > every edit. And you can change it by using "deferred log flushing". Either > way, sync() is going to be a pipelined write. > > There's an explanation of benefits of pipelined and n-way writes there in > the book (p337), it's not just about which approach provides better > durability of saved edits. Both of them do. But both can take different > time to execute and utilize network differently: pipelined *may* be slower > but can saturate network bandwidth better. > > Alex Baranau > ------ > Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - > Solr > > On Tue, Jul 31, 2012 at 9:09 PM, Mohit Anchlia > <[email protected]<javascript:;> > >wrote: > > > In the HBase book it mentioned that the default behaviour of write is to > > call sync on each node before sending replica copies to the nodes in the > > pipeline. Is there a reason this was kept default because if data is > > getting written on multiple nodes then likelyhood of losing data is > really > > low since another copy is always there on the replica nodes. Is it ok to > > make this sync async and is it advisable? > > > > > > -- > Alex Baranau > ------ > Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - > Solr >
