On Fri, Dec 9, 2011 at 11:40 AM, yonghu <[email protected]> wrote: > I read some discussions from the mail-list. It mentions the read and > write operations for the same data object will be routed into the same > RegionServer. This strategy can guarantee data consistency. But, how > about availability? If this RegionServer is down or temporarily not > available, the master will assign a new RegionServer for processing > data request or just wait until that RegionServer comes back? If mater > assigns new RegionServer, how can new RegionServer obtains data? >
If regionserver is down so long it loses its lease in zookeeper, another regionserver will be told serve the unavailable reigonserver's data. Data is kept in hdfs, a distributed filesystem, to which the new regionserver has access. > The other issue is about work-balance. If a huge amount of read and > write operations only apply on a small set of data, one RegionServer > may become a hot-spot. How HBase deal with this problems? > If a single cell, there is not much you can do currently (there has been talk of making read replicas available but TODO). If a single row, you'd need to redo your schema to spread the row content. If the region is hot, you can split it and spread its shards about the cluster. > The last question is about data replica. The HBase data is still > stored in HDFS. HDFS will use eager synchronization (pipelining) to > synchronize all replicas. If HBase write data into HDFS, when should > HDFS return the write finishing acknowledge to HBase, just waiting > until one replica update or until all replicas update? > All. St.Ack
