On Fri, Dec 9, 2011 at 11:40 AM, yonghu <[email protected]> wrote:
> I read some discussions from the mail-list. It mentions the read and
> write operations for the same data object will be routed into the same
> RegionServer. This strategy can guarantee data consistency. But, how
> about availability?  If this RegionServer is down or temporarily not
> available, the master will assign a new RegionServer for processing
> data request or just wait until that RegionServer comes back? If mater
> assigns new RegionServer, how can new RegionServer obtains data?
>

If regionserver is down so long it loses its lease in zookeeper,
another regionserver will be told serve the unavailable reigonserver's
data.

Data is kept in hdfs, a distributed filesystem, to which the new
regionserver has access.

> The other issue is about work-balance. If a huge amount of read and
> write operations only apply on a small set of data, one RegionServer
> may become a hot-spot. How HBase deal with this problems?
>

If a single cell, there is not much you can do currently (there has
been talk of making read replicas available but TODO).  If a single
row, you'd need to redo your schema to spread the row content.  If the
region is hot, you can split it and spread its shards about the
cluster.


> The last question is about data replica.  The HBase data is still
> stored in HDFS. HDFS will use eager synchronization (pipelining) to
> synchronize all replicas. If HBase write data into HDFS, when should
> HDFS return the write finishing acknowledge to HBase, just waiting
> until one replica update or until all replicas update?
>

All.

St.Ack

Reply via email to