Re: Implementing Read Replica feature in open source HBase?
Hi Wellington, No, there's no technical design doc with higher level of detail other than that we've already shared and what AWS paper is about, if that's what you mean. We've identified 2 tasks so far (meta table suffix and global read-only mode) which are represented by Jira sub-tasks and started to implement them on the feature branch. The design doc gives a high level overview of what we would like to implement. It's a bit short, because there're lots of details already available in AWS paper which I didn't want to repeat. Further implementation details are expected to be discussed in the Jira tickets. Please let me know if you see room for improvement. Thanks, Andor On Mon, 2025-02-03 at 12:15 +, Wellington Chevreuil wrote: > Hi Andor, > > Thanks for sharing this scope document. > > Is there a technical design doc that could be attached to the > umbrella > ticket, or shared in this thread? > > Regards. > > Em ter., 14 de jan. de 2025 às 21:29, Andor Molnar > escreveu: > > > Created feature branch: > > > > https://github.com/apache/hbase/tree/HBASE-29081 > > > > and umbrella Jira ticket: > > > > https://issues.apache.org/jira/browse/HBASE-29081 > > > > > > Regards, > > Andor > > > > > > > > > > On Mon, 2025-01-13 at 11:10 -0600, Andor Molnar wrote: > > > Hello everyone, > > > > > > We've discussing an idea internally at Cloudera about > > > implementing > > > the > > > open source version of [1] Amazon's EMR Read Replica Clusters on > > > Amazon > > > S3. A feature which would let us run HBase in read-only mode and > > > having > > > another cluster running on the same storage location, will be > > > potentially beneficial to our customers. Main advantages are > > > optimizing > > > the cost of object storage in the cloud and sharing the read > > > workload > > > between multiple clusters. > > > > > > In the Open Source area we also have room for improvement: > > > support > > > other object store providers, automate processes which are > > > currently > > > manual, and so on. > > > > > > We’d greatly appreciate your feedback in the document: whether > > > it’s > > > about the viability of the idea, areas for improvement, or > > > suggestions > > > to simplify the approach. > > > > > > [1] > > > > > https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/ > > > > > > [2] > > > > > https://docs.google.com/document/d/1EI0lsURX1BZhv3DYgMvZCl4EUy-ADJRkHUc1PjzZtj0/edit?usp=sharing > > > > > > > > > Regards, > > > > > > Andor > > > > > > > > > >
Re: Implementing Read Replica feature in open source HBase?
Hi Andor, Thanks for sharing this scope document. Is there a technical design doc that could be attached to the umbrella ticket, or shared in this thread? Regards. Em ter., 14 de jan. de 2025 às 21:29, Andor Molnar escreveu: > Created feature branch: > > https://github.com/apache/hbase/tree/HBASE-29081 > > and umbrella Jira ticket: > > https://issues.apache.org/jira/browse/HBASE-29081 > > > Regards, > Andor > > > > > On Mon, 2025-01-13 at 11:10 -0600, Andor Molnar wrote: > > Hello everyone, > > > > We've discussing an idea internally at Cloudera about implementing > > the > > open source version of [1] Amazon's EMR Read Replica Clusters on > > Amazon > > S3. A feature which would let us run HBase in read-only mode and > > having > > another cluster running on the same storage location, will be > > potentially beneficial to our customers. Main advantages are > > optimizing > > the cost of object storage in the cloud and sharing the read workload > > between multiple clusters. > > > > In the Open Source area we also have room for improvement: support > > other object store providers, automate processes which are currently > > manual, and so on. > > > > We’d greatly appreciate your feedback in the document: whether it’s > > about the viability of the idea, areas for improvement, or > > suggestions > > to simplify the approach. > > > > [1] > > > https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/ > > > > [2] > > > https://docs.google.com/document/d/1EI0lsURX1BZhv3DYgMvZCl4EUy-ADJRkHUc1PjzZtj0/edit?usp=sharing > > > > > > Regards, > > > > Andor > > > > > >
Re: Implementing Read Replica feature in open source HBase?
Created feature branch: https://github.com/apache/hbase/tree/HBASE-29081 and umbrella Jira ticket: https://issues.apache.org/jira/browse/HBASE-29081 Regards, Andor On Mon, 2025-01-13 at 11:10 -0600, Andor Molnar wrote: > Hello everyone, > > We've discussing an idea internally at Cloudera about implementing > the > open source version of [1] Amazon's EMR Read Replica Clusters on > Amazon > S3. A feature which would let us run HBase in read-only mode and > having > another cluster running on the same storage location, will be > potentially beneficial to our customers. Main advantages are > optimizing > the cost of object storage in the cloud and sharing the read workload > between multiple clusters. > > In the Open Source area we also have room for improvement: support > other object store providers, automate processes which are currently > manual, and so on. > > We’d greatly appreciate your feedback in the document: whether it’s > about the viability of the idea, areas for improvement, or > suggestions > to simplify the approach. > > [1] > https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/ > > [2] > https://docs.google.com/document/d/1EI0lsURX1BZhv3DYgMvZCl4EUy-ADJRkHUc1PjzZtj0/edit?usp=sharing > > > Regards, > > Andor > >
Implementing Read Replica feature in open source HBase?
Hello everyone, We've discussing an idea internally at Cloudera about implementing the open source version of [1] Amazon's EMR Read Replica Clusters on Amazon S3. A feature which would let us run HBase in read-only mode and having another cluster running on the same storage location, will be potentially beneficial to our customers. Main advantages are optimizing the cost of object storage in the cloud and sharing the read workload between multiple clusters. In the Open Source area we also have room for improvement: support other object store providers, automate processes which are currently manual, and so on. We’d greatly appreciate your feedback in the document: whether it’s about the viability of the idea, areas for improvement, or suggestions to simplify the approach. [1] https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/ [2] https://docs.google.com/document/d/1EI0lsURX1BZhv3DYgMvZCl4EUy-ADJRkHUc1PjzZtj0/edit?usp=sharing Regards, Andor
