Re: Implementing Read Replica feature in open source HBase?

2025-02-03 Thread Andor Molnar
Hi Wellington,

No, there's no technical design doc with higher level of detail other
than that we've already shared and what AWS paper is about, if that's
what you mean.

We've identified 2 tasks so far (meta table suffix and global read-only
mode) which are represented by Jira sub-tasks and started to implement
them on the feature branch. 

The design doc gives a high level overview of what we would like to
implement. It's a bit short, because there're lots of details already
available in AWS paper which I didn't want to repeat. Further
implementation details are expected to be discussed in the Jira
tickets.

Please let me know if you see room for improvement.

Thanks,
Andor



On Mon, 2025-02-03 at 12:15 +, Wellington Chevreuil wrote:
> Hi Andor,
> 
> Thanks for sharing this scope document.
> 
> Is there a technical design doc that could be attached to the
> umbrella
> ticket, or shared in this thread?
> 
> Regards.
> 
> Em ter., 14 de jan. de 2025 às 21:29, Andor Molnar 
> escreveu:
> 
> > Created feature branch:
> > 
> > https://github.com/apache/hbase/tree/HBASE-29081
> > 
> > and umbrella Jira ticket:
> > 
> > https://issues.apache.org/jira/browse/HBASE-29081
> > 
> > 
> > Regards,
> > Andor
> > 
> > 
> > 
> > 
> > On Mon, 2025-01-13 at 11:10 -0600, Andor Molnar wrote:
> > > Hello everyone,
> > > 
> > > We've discussing an idea internally at Cloudera about
> > > implementing
> > > the
> > > open source version of [1] Amazon's EMR Read Replica Clusters on
> > > Amazon
> > > S3. A feature which would let us run HBase in read-only mode and
> > > having
> > > another cluster running on the same storage location, will be
> > > potentially beneficial to our customers. Main advantages are
> > > optimizing
> > > the cost of object storage in the cloud and sharing the read
> > > workload
> > > between multiple clusters.
> > > 
> > > In the Open Source area we also have room for improvement:
> > > support
> > > other object store providers, automate processes which are
> > > currently
> > > manual, and so on.
> > > 
> > > We’d greatly appreciate your feedback in the document: whether
> > > it’s
> > > about the viability of the idea, areas for improvement, or
> > > suggestions
> > > to simplify the approach.
> > > 
> > > [1]
> > > 
> > https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/
> > > 
> > > [2]
> > > 
> > https://docs.google.com/document/d/1EI0lsURX1BZhv3DYgMvZCl4EUy-ADJRkHUc1PjzZtj0/edit?usp=sharing
> > > 
> > > 
> > > Regards,
> > > 
> > > Andor
> > > 
> > > 
> > 
> > 



Re: Implementing Read Replica feature in open source HBase?

2025-02-03 Thread Wellington Chevreuil
Hi Andor,

Thanks for sharing this scope document.

Is there a technical design doc that could be attached to the umbrella
ticket, or shared in this thread?

Regards.

Em ter., 14 de jan. de 2025 às 21:29, Andor Molnar 
escreveu:

> Created feature branch:
>
> https://github.com/apache/hbase/tree/HBASE-29081
>
> and umbrella Jira ticket:
>
> https://issues.apache.org/jira/browse/HBASE-29081
>
>
> Regards,
> Andor
>
>
>
>
> On Mon, 2025-01-13 at 11:10 -0600, Andor Molnar wrote:
> > Hello everyone,
> >
> > We've discussing an idea internally at Cloudera about implementing
> > the
> > open source version of [1] Amazon's EMR Read Replica Clusters on
> > Amazon
> > S3. A feature which would let us run HBase in read-only mode and
> > having
> > another cluster running on the same storage location, will be
> > potentially beneficial to our customers. Main advantages are
> > optimizing
> > the cost of object storage in the cloud and sharing the read workload
> > between multiple clusters.
> >
> > In the Open Source area we also have room for improvement: support
> > other object store providers, automate processes which are currently
> > manual, and so on.
> >
> > We’d greatly appreciate your feedback in the document: whether it’s
> > about the viability of the idea, areas for improvement, or
> > suggestions
> > to simplify the approach.
> >
> > [1]
> >
> https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/
> >
> > [2]
> >
> https://docs.google.com/document/d/1EI0lsURX1BZhv3DYgMvZCl4EUy-ADJRkHUc1PjzZtj0/edit?usp=sharing
> >
> >
> > Regards,
> >
> > Andor
> >
> >
>
>


Re: Implementing Read Replica feature in open source HBase?

2025-01-14 Thread Andor Molnar
Created feature branch:

https://github.com/apache/hbase/tree/HBASE-29081

and umbrella Jira ticket:

https://issues.apache.org/jira/browse/HBASE-29081


Regards,
Andor




On Mon, 2025-01-13 at 11:10 -0600, Andor Molnar wrote:
> Hello everyone,
> 
> We've discussing an idea internally at Cloudera about implementing
> the
> open source version of [1] Amazon's EMR Read Replica Clusters on
> Amazon
> S3. A feature which would let us run HBase in read-only mode and
> having
> another cluster running on the same storage location, will be
> potentially beneficial to our customers. Main advantages are
> optimizing
> the cost of object storage in the cloud and sharing the read workload
> between multiple clusters.
> 
> In the Open Source area we also have room for improvement: support
> other object store providers, automate processes which are currently
> manual, and so on.
> 
> We’d greatly appreciate your feedback in the document: whether it’s
> about the viability of the idea, areas for improvement, or
> suggestions
> to simplify the approach.
> 
> [1]
> https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/
> 
> [2]
> https://docs.google.com/document/d/1EI0lsURX1BZhv3DYgMvZCl4EUy-ADJRkHUc1PjzZtj0/edit?usp=sharing
> 
> 
> Regards,
> 
> Andor
> 
> 



Implementing Read Replica feature in open source HBase?

2025-01-13 Thread Andor Molnar
Hello everyone,

We've discussing an idea internally at Cloudera about implementing the
open source version of [1] Amazon's EMR Read Replica Clusters on Amazon
S3. A feature which would let us run HBase in read-only mode and having
another cluster running on the same storage location, will be
potentially beneficial to our customers. Main advantages are optimizing
the cost of object storage in the cloud and sharing the read workload
between multiple clusters.

In the Open Source area we also have room for improvement: support
other object store providers, automate processes which are currently
manual, and so on.

We’d greatly appreciate your feedback in the document: whether it’s
about the viability of the idea, areas for improvement, or suggestions
to simplify the approach.

[1]
https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/

[2]
https://docs.google.com/document/d/1EI0lsURX1BZhv3DYgMvZCl4EUy-ADJRkHUc1PjzZtj0/edit?usp=sharing


Regards,

Andor