They are flushed to 3 nodes (but not sync'ed to disk on those replicas), so you'll eat 3 network RTTs.
I wrote a bit about this here: http://hadoop-hbase.blogspot.com/2012/05/hbase-hdfs-and-durable-sync.html You can switch a column family to deferred log flush. In that case the edit is flushed to the 3 replication asynchronously with 1 or 2 secs. (And if even get to finish HBASE-7801, one can control this per mutation). -- Lars ________________________________ From: Dan Crosta <[email protected]> To: "[email protected]" <[email protected]> Sent: Saturday, March 2, 2013 10:47 AM Subject: Re: HBase Thrift inserts bottlenecked somewhere -- but where? On Mar 2, 2013, at 12:38 PM, lars hofhansl wrote: > "That's only true from the HDFS perspective, right? Any given region is > "owned" by 1 of the 6 regionservers at any given time, and writes are > buffered to memory before being persisted to HDFS, right?" > > Only if you disabled the WAL, otherwise each change is written to the WAL > first, and then committed to the memstore. > So in the sense it's even worse. Each edit is written twice to the FS, > replicated 3 times, and all that only 6 data nodes. Are these writes synchronized somehow? Could there be a locking problem somewhere that wouldn't show up as utilization of disk or cpu? What is the upshot of disabling WAL -- I assume it means that if a RegionServer crashes, you lose any writes that it has in memory but not committed to HFiles? > 20k writes does seem a bit low. I adjusted dfs.datanode.handler.count from 3 to 10 and now we're up to about 22-23k writes per second, but still no apparent contention for any of the basic system resources. Any other suggestions on things to try? Thanks, - Dan
