[
https://issues.apache.org/jira/browse/HBASE-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695243#comment-13695243
]
Anoop Sam John commented on HBASE-8815:
---
I am also working with similar item and was thinking on the possibilities.. A
client layer on top of HTable can be made which can do the autoswitching of the
peer.. I will look into this more next week and get back. (Have some more open
Qs in mind :) )
A replicated cross cluster client
-
Key: HBASE-8815
URL: https://issues.apache.org/jira/browse/HBASE-8815
Project: HBase
Issue Type: New Feature
Reporter: Varun Sharma
I would like to float this idea for brain storming.
HBase is a strongly consistent system modelled after bigtable which means a
machine going down results in loss of availability of around 2 minutes as it
stands today. So there is a trade off.
However, for high availability and redundancy, it is common practice for
online/mission critical applications to run replicated clusters. For example,
we run replicated clusters at pinterest in different EC2 az(s) and at google,
critical data is always replicated across bigtable cells.
At high volumes, 2 minutes of downtime can also be critical, however, today
our client does not make use of the fact, that there is an available slave
replica cluster from which slightly inconsistent data can be read. It only
reads from one cluster. When you have replication, it is a very common
practice for reading from slave if the error rate from master is high. That
is how, web sites serve data out of MySQL and survive machine failures by
directing their reads to slave machines when the master goes down.
I am sure folks love the strong consistency gaurantee from HBase, but I think
that this way, we can make better use of the replica cluster, much in the
same way people use MySQL slaves for reads. In case of regions going offline,
it would be nice if, for the offline regions only (a small fraction), reads
could be directed to the slave cluster.
I know one company which follows this model. At Google, a replicated client
api is used for reads which is able to farm reads to multiple clusters and
also writes to multiple clusters depending on availability in case of Multi
master replication.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira