Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-08-07 Thread Toshihiro Suzuki
Hi Josh,

I’m interested in this, too. I’ll start to read and follow what you suggested 
to Jack.

Thanks,
Toshi

> On Aug 7, 2018, at 09:12, Jack Bearden  wrote:
> 
> Thanks Josh that was helpful. I'll start doing some of my own research
> around these topics and look into that Ratis ticket. Much appreciated!
> 
> On Mon, Aug 6, 2018 at 8:04 AM, Josh Elser  wrote:
> 
>> Yup, replication is a big one to "unravel". Repeating myself from a branch
>> in the thread, but I'd expect some initial suggestions on what a new API
>> could be this week. Certainly the first draft won't be the final -- would
>> be great to get your input after your AsyncWAL work, Duo.
>> 
>> Using AWS SimpleQueryService, or much anything else, would be great. I
>> want to make sure that, while we try to "scratch this one itch", we pave
>> the way for whatever else folks want to experiment with.
>> 
>> 
>> On 8/4/18 5:10 AM, 张铎(Duo Zhang) wrote:
>> 
>>> Yes, maybe we could write WAL to SQS and HFile to S3, then we can deploy
>>> HBase on AWS without any local storage volumes...
>>> 
>>> But we also need a good abstraction for Replication, as the current design
>>> is file based...
>>> 
>>> 2018-07-27 1:28 GMT+08:00 Zach York :
>>> 
>>> I would REALLY hope that the WAL interface/API changes would go into
 master
 even if the feature work for Ratis is going in a feature branch. Not only
 would this enable other backends to be developed in parallel with the
 Ratis
 solution if there are other good fits for a non-HDFS WAL, but also it
 would
 save the burden of having to rebase these core changes onto the latest
 master to maintain compatibility. I'm assuming the Ratis portion of the
 code would be mostly new files so these would be less of a concern.
 
 On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser  wrote:
 
 On 7/26/18 1:00 AM, Stack wrote:
> 
> All this said, I'd like to start moving toward the point where we start
>> 
>>> breaking out this work into a feature-branch off of master and start
>>> building code. My hope is that this is amenable to everyone, with the
>>> acknowledge that the Ratis work is considered "experimental" and not
>>> an
>>> attempt to make all of HBase use Ratis-backed WALs.
>>> 
>>> 
>>> Go for it.
>>> 
>> 
>> The branch would have WAL API changes only or would it include Ratis
>> WAL
>> dev? (If the latter, would that be better done over on Ratis project?).
>> 
>> 
> I think we would start with WAL API changes, get those "blessed", and
> 
 then
 
> continue Ratis WAL dev after that.
> 
> 
 
>>> 



Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-08-06 Thread Jack Bearden
Thanks Josh that was helpful. I'll start doing some of my own research
around these topics and look into that Ratis ticket. Much appreciated!

On Mon, Aug 6, 2018 at 8:04 AM, Josh Elser  wrote:

> Yup, replication is a big one to "unravel". Repeating myself from a branch
> in the thread, but I'd expect some initial suggestions on what a new API
> could be this week. Certainly the first draft won't be the final -- would
> be great to get your input after your AsyncWAL work, Duo.
>
> Using AWS SimpleQueryService, or much anything else, would be great. I
> want to make sure that, while we try to "scratch this one itch", we pave
> the way for whatever else folks want to experiment with.
>
>
> On 8/4/18 5:10 AM, 张铎(Duo Zhang) wrote:
>
>> Yes, maybe we could write WAL to SQS and HFile to S3, then we can deploy
>> HBase on AWS without any local storage volumes...
>>
>> But we also need a good abstraction for Replication, as the current design
>> is file based...
>>
>> 2018-07-27 1:28 GMT+08:00 Zach York :
>>
>> I would REALLY hope that the WAL interface/API changes would go into
>>> master
>>> even if the feature work for Ratis is going in a feature branch. Not only
>>> would this enable other backends to be developed in parallel with the
>>> Ratis
>>> solution if there are other good fits for a non-HDFS WAL, but also it
>>> would
>>> save the burden of having to rebase these core changes onto the latest
>>> master to maintain compatibility. I'm assuming the Ratis portion of the
>>> code would be mostly new files so these would be less of a concern.
>>>
>>> On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser  wrote:
>>>
>>> On 7/26/18 1:00 AM, Stack wrote:

 All this said, I'd like to start moving toward the point where we start
>
>> breaking out this work into a feature-branch off of master and start
>> building code. My hope is that this is amenable to everyone, with the
>> acknowledge that the Ratis work is considered "experimental" and not
>> an
>> attempt to make all of HBase use Ratis-backed WALs.
>>
>>
>> Go for it.
>>
>
> The branch would have WAL API changes only or would it include Ratis
> WAL
> dev? (If the latter, would that be better done over on Ratis project?).
>
>
 I think we would start with WAL API changes, get those "blessed", and

>>> then
>>>
 continue Ratis WAL dev after that.


>>>
>>


Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-08-06 Thread Josh Elser
Yup, replication is a big one to "unravel". Repeating myself from a 
branch in the thread, but I'd expect some initial suggestions on what a 
new API could be this week. Certainly the first draft won't be the final 
-- would be great to get your input after your AsyncWAL work, Duo.


Using AWS SimpleQueryService, or much anything else, would be great. I 
want to make sure that, while we try to "scratch this one itch", we pave 
the way for whatever else folks want to experiment with.


On 8/4/18 5:10 AM, 张铎(Duo Zhang) wrote:

Yes, maybe we could write WAL to SQS and HFile to S3, then we can deploy
HBase on AWS without any local storage volumes...

But we also need a good abstraction for Replication, as the current design
is file based...

2018-07-27 1:28 GMT+08:00 Zach York :


I would REALLY hope that the WAL interface/API changes would go into master
even if the feature work for Ratis is going in a feature branch. Not only
would this enable other backends to be developed in parallel with the Ratis
solution if there are other good fits for a non-HDFS WAL, but also it would
save the burden of having to rebase these core changes onto the latest
master to maintain compatibility. I'm assuming the Ratis portion of the
code would be mostly new files so these would be less of a concern.

On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser  wrote:


On 7/26/18 1:00 AM, Stack wrote:


All this said, I'd like to start moving toward the point where we start

breaking out this work into a feature-branch off of master and start
building code. My hope is that this is amenable to everyone, with the
acknowledge that the Ratis work is considered "experimental" and not an
attempt to make all of HBase use Ratis-backed WALs.


Go for it.


The branch would have WAL API changes only or would it include Ratis WAL
dev? (If the latter, would that be better done over on Ratis project?).



I think we would start with WAL API changes, get those "blessed", and

then

continue Ratis WAL dev after that.







Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-08-06 Thread Josh Elser

Thanks for your interest, Jack.

Taking a read through the "design doc" and related discussion is a good 
starting point. Re-linked here for your convenience [1]


https://issues.apache.org/jira/browse/HBASE-20951 has a break-down of 
work items. The first thing we need to work on is the WAL API. I know 
that our Ted and Ankit have been digging through the current codebase to 
get a starting point. I'd expect some initial "pseudocode" this week, 
but this will be a long pole to get some agreed upon, I imagine.


You can also look at https://issues.apache.org/jira/browse/RATIS-271 
which will likely be quicker to have code getting committed. A few of us 
have been playing around with the code there already, and doing a 
similar API "deep-dive" exercise.


And, if it doesn't go without saying, feel free to ask questions on the 
mailing list here or in Slack. I'll do my best to answer them in a 
timely fashion :)


- Josh

[1] 
https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM


On 8/3/18 2:42 PM, Jack Bearden wrote:

Great work! I'm excited about this feature and want to help with
development. What do you guys suggest are the best topics to ramp up in
preparation for these upcoming sprints?

On Fri, Aug 3, 2018 at 10:18 AM, Josh Elser  wrote:


Yup, we're on the same page :)


On 7/26/18 1:28 PM, Zach York wrote:


I would REALLY hope that the WAL interface/API changes would go into
master
even if the feature work for Ratis is going in a feature branch. Not only
would this enable other backends to be developed in parallel with the
Ratis
solution if there are other good fits for a non-HDFS WAL, but also it
would
save the burden of having to rebase these core changes onto the latest
master to maintain compatibility. I'm assuming the Ratis portion of the
code would be mostly new files so these would be less of a concern.

On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser  wrote:

On 7/26/18 1:00 AM, Stack wrote:


All this said, I'd like to start moving toward the point where we start



breaking out this work into a feature-branch off of master and start
building code. My hope is that this is amenable to everyone, with the
acknowledge that the Ratis work is considered "experimental" and not an
attempt to make all of HBase use Ratis-backed WALs.


Go for it.



The branch would have WAL API changes only or would it include Ratis WAL
dev? (If the latter, would that be better done over on Ratis project?).



I think we would start with WAL API changes, get those "blessed", and
then
continue Ratis WAL dev after that.








Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-08-04 Thread Duo Zhang
Yes, maybe we could write WAL to SQS and HFile to S3, then we can deploy
HBase on AWS without any local storage volumes...

But we also need a good abstraction for Replication, as the current design
is file based...

2018-07-27 1:28 GMT+08:00 Zach York :

> I would REALLY hope that the WAL interface/API changes would go into master
> even if the feature work for Ratis is going in a feature branch. Not only
> would this enable other backends to be developed in parallel with the Ratis
> solution if there are other good fits for a non-HDFS WAL, but also it would
> save the burden of having to rebase these core changes onto the latest
> master to maintain compatibility. I'm assuming the Ratis portion of the
> code would be mostly new files so these would be less of a concern.
>
> On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser  wrote:
>
> > On 7/26/18 1:00 AM, Stack wrote:
> >
> >> All this said, I'd like to start moving toward the point where we start
> >>> breaking out this work into a feature-branch off of master and start
> >>> building code. My hope is that this is amenable to everyone, with the
> >>> acknowledge that the Ratis work is considered "experimental" and not an
> >>> attempt to make all of HBase use Ratis-backed WALs.
> >>>
> >>>
> >>> Go for it.
> >>
> >> The branch would have WAL API changes only or would it include Ratis WAL
> >> dev? (If the latter, would that be better done over on Ratis project?).
> >>
> >
> > I think we would start with WAL API changes, get those "blessed", and
> then
> > continue Ratis WAL dev after that.
> >
>


Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-08-03 Thread Jack Bearden
Great work! I'm excited about this feature and want to help with
development. What do you guys suggest are the best topics to ramp up in
preparation for these upcoming sprints?

On Fri, Aug 3, 2018 at 10:18 AM, Josh Elser  wrote:

> Yup, we're on the same page :)
>
>
> On 7/26/18 1:28 PM, Zach York wrote:
>
>> I would REALLY hope that the WAL interface/API changes would go into
>> master
>> even if the feature work for Ratis is going in a feature branch. Not only
>> would this enable other backends to be developed in parallel with the
>> Ratis
>> solution if there are other good fits for a non-HDFS WAL, but also it
>> would
>> save the burden of having to rebase these core changes onto the latest
>> master to maintain compatibility. I'm assuming the Ratis portion of the
>> code would be mostly new files so these would be less of a concern.
>>
>> On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser  wrote:
>>
>> On 7/26/18 1:00 AM, Stack wrote:
>>>
>>> All this said, I'd like to start moving toward the point where we start

> breaking out this work into a feature-branch off of master and start
> building code. My hope is that this is amenable to everyone, with the
> acknowledge that the Ratis work is considered "experimental" and not an
> attempt to make all of HBase use Ratis-backed WALs.
>
>
> Go for it.
>

 The branch would have WAL API changes only or would it include Ratis WAL
 dev? (If the latter, would that be better done over on Ratis project?).


>>> I think we would start with WAL API changes, get those "blessed", and
>>> then
>>> continue Ratis WAL dev after that.
>>>
>>>
>>


Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-08-03 Thread Josh Elser

Yup, we're on the same page :)

On 7/26/18 1:28 PM, Zach York wrote:

I would REALLY hope that the WAL interface/API changes would go into master
even if the feature work for Ratis is going in a feature branch. Not only
would this enable other backends to be developed in parallel with the Ratis
solution if there are other good fits for a non-HDFS WAL, but also it would
save the burden of having to rebase these core changes onto the latest
master to maintain compatibility. I'm assuming the Ratis portion of the
code would be mostly new files so these would be less of a concern.

On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser  wrote:


On 7/26/18 1:00 AM, Stack wrote:


All this said, I'd like to start moving toward the point where we start

breaking out this work into a feature-branch off of master and start
building code. My hope is that this is amenable to everyone, with the
acknowledge that the Ratis work is considered "experimental" and not an
attempt to make all of HBase use Ratis-backed WALs.


Go for it.


The branch would have WAL API changes only or would it include Ratis WAL
dev? (If the latter, would that be better done over on Ratis project?).



I think we would start with WAL API changes, get those "blessed", and then
continue Ratis WAL dev after that.





Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-07-26 Thread Zach York
I would REALLY hope that the WAL interface/API changes would go into master
even if the feature work for Ratis is going in a feature branch. Not only
would this enable other backends to be developed in parallel with the Ratis
solution if there are other good fits for a non-HDFS WAL, but also it would
save the burden of having to rebase these core changes onto the latest
master to maintain compatibility. I'm assuming the Ratis portion of the
code would be mostly new files so these would be less of a concern.

On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser  wrote:

> On 7/26/18 1:00 AM, Stack wrote:
>
>> All this said, I'd like to start moving toward the point where we start
>>> breaking out this work into a feature-branch off of master and start
>>> building code. My hope is that this is amenable to everyone, with the
>>> acknowledge that the Ratis work is considered "experimental" and not an
>>> attempt to make all of HBase use Ratis-backed WALs.
>>>
>>>
>>> Go for it.
>>
>> The branch would have WAL API changes only or would it include Ratis WAL
>> dev? (If the latter, would that be better done over on Ratis project?).
>>
>
> I think we would start with WAL API changes, get those "blessed", and then
> continue Ratis WAL dev after that.
>


Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-07-26 Thread Josh Elser

On 7/26/18 1:00 AM, Stack wrote:

All this said, I'd like to start moving toward the point where we start
breaking out this work into a feature-branch off of master and start
building code. My hope is that this is amenable to everyone, with the
acknowledge that the Ratis work is considered "experimental" and not an
attempt to make all of HBase use Ratis-backed WALs.



Go for it.

The branch would have WAL API changes only or would it include Ratis WAL
dev? (If the latter, would that be better done over on Ratis project?).


I think we would start with WAL API changes, get those "blessed", and 
then continue Ratis WAL dev after that.


Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-07-25 Thread Stack
On Wed, Jul 25, 2018 at 11:55 AM Josh Elser  wrote:

> ...
> My biggest take-away is that I complicated this document by tying it too
> closely with "HBase on Cloud", treating the WAL+Ratis LogService as the
> only/biggest thing to figure out. This was inaccurate and overly bold of
> me: I apologize. I think this complicated discussion on a number of
> points, and ate a good bit of some of your's time.
>
>
No need of apology.

There was healthy back and forth. You read the feedback and took it on
board.

(See below).



> My goal was to present this as an important part of a transition to the
> "cloud", giving justification to what WAL+Ratis helps HBase achieve. I
> did not want this document to be a step-by-step guide to a perfect HBase
> on Cloud design. I need to do a better job with this in the future; sorry.

That said, my feeling is that, on the whole, folks are in support of the
> proposed changes/architecture described for the WAL+Ratis work (tl;dr
> revisit WAL API, plug in current WAL implementation to any API
> modification, build new Ratis-backed WAL impl). There were some concerns
> which still need immediate action that I am aware of:
>
> * Sync with Ram and Anoop re: in-memory WAL [1]
> * Where is Ratis LogService metadata kept? How do we know what
> LogStreams were being used/maintained by a RS? How does this tie into
> recovery?
>
> There are also long-term concerns which I don't think I have an answer
> for yet (for either reasons out of my control or a lack of technical
> understanding):
>
> * Maturity of the Ratis community
> * Required performance by HBase and the ability of the LogService to
> provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging
> down disks, ability to scale RAFT quorums).
> * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf,
> dependent upon Ratis scalability.
> * I/O amplification on WAL retention for backup and replication
> ("logstream export")
> * Ensure that LogStreams can be exported to a dist-filesystem in a
> manner which requires no additional metadata/handling (avoid more
> storage/mgmt complexity)
> * Ability to build krb5 authn into Ratis (really, gRPC)
>
> I will continue the two immediate action items. I think the latter
> concerns are some that will require fingers-on-keyboard -- I don't know
> enough about runtime characteristics without seeing it for myself.
>
> All this said, I'd like to start moving toward the point where we start
> breaking out this work into a feature-branch off of master and start
> building code. My hope is that this is amenable to everyone, with the
> acknowledge that the Ratis work is considered "experimental" and not an
> attempt to make all of HBase use Ratis-backed WALs.
>
>

Go for it.

The branch would have WAL API changes only or would it include Ratis WAL
dev? (If the latter, would that be better done over on Ratis project?).

S


> Finally, I do *not* want this message to be interpreted as me squashing
> anyone's concerns. My honest opinion is that discussion has died down,
> but I will be the first to apologize if I have missed any outstanding
> concerns. Please, please, please ping me if I am negligent.
>
> Thanks once again for everyone's participation.
>
> [1]
>
> https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM
>
> On 2018/07/13 20:15:45, Josh Elser  wrote: > Hi all,
> >
> > A long time ago, I shared a document about a (I'll call it..) "vision"
> > where we make some steps towards decoupling HBase from HDFS in an effort
> > to make deploying HBase on Cloud IaaS providers a bit easier
> > (operational simplicity, effective use of common IaaS paradigms, etc).
> >
> >
> https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?usp=sharing
> >
> > A good ask from our Stack back then was: "[can you break down this
> > work]?" The original document was very high-level, and asking for some
> > more details make a lot of sense. Months later, I'd like to share that
> > I've updated the original document with some new content at the bottom
> > (as well as addressed some comments which went unanswered by me --
> sorry!)
> >
> > Based on a discussion I had earlier this week (and some discussions
> > during HBaseCon in California in June), I've tried to add a brief
> > "refresher" on what some of the big goals for this effort are. Please
> > check it out at your leisure and let me know what you think. Would like
> > to start getting some fingers behind this all and pump out some code :)
> >
> >
> https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#bookmark=id.fml9ynrqagk
> >
> > - Josh
> >
>


Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-07-25 Thread Josh Elser

Thanks, Zach!

I like your suggestion about project updates. I sincerely hope that this 
can be something transparent enough that folks who want to follow-on and 
participate in implementation can do so. Let me think about how to drive 
this better.


On 7/25/18 3:55 PM, Zach York wrote:

+1 to starting the work. I think most of the concerns can be figured out on
the JIRAs and we can have a project update every X weeks if enough people
are interested.

I also agree to frame the feature correctly. Decoupling from a HDFS WAL or
WAL on Ratis would be more appropriate names that would better convey the
scope. I think there are a number of projects necessary to complete "HBase
on Cloud" with this being one of those.


Thanks for driving this initiative!

Zach


On Wed, Jul 25, 2018 at 11:55 AM, Josh Elser  wrote:


Let me give an update on-list for everyone:

First and foremost, thank you very much to everyone who took the time to
read this, with an extra thanks to those who participated in discussion.
There were lots of great points raised. Some about things that were unclear
in the doc, and others shining light onto subjects I hadn't considered yet.

My biggest take-away is that I complicated this document by tying it too
closely with "HBase on Cloud", treating the WAL+Ratis LogService as the
only/biggest thing to figure out. This was inaccurate and overly bold of
me: I apologize. I think this complicated discussion on a number of points,
and ate a good bit of some of your's time.

My goal was to present this as an important part of a transition to the
"cloud", giving justification to what WAL+Ratis helps HBase achieve. I did
not want this document to be a step-by-step guide to a perfect HBase on
Cloud design. I need to do a better job with this in the future; sorry.

That said, my feeling is that, on the whole, folks are in support of the
proposed changes/architecture described for the WAL+Ratis work (tl;dr
revisit WAL API, plug in current WAL implementation to any API
modification, build new Ratis-backed WAL impl). There were some concerns
which still need immediate action that I am aware of:

* Sync with Ram and Anoop re: in-memory WAL [1]
* Where is Ratis LogService metadata kept? How do we know what LogStreams
were being used/maintained by a RS? How does this tie into recovery?

There are also long-term concerns which I don't think I have an answer for
yet (for either reasons out of my control or a lack of technical
understanding):

* Maturity of the Ratis community
* Required performance by HBase and the ability of the LogService to
provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging down
disks, ability to scale RAFT quorums).
* Continue with WAL-per-RS or move to WAL-per-Region? Related to perf,
dependent upon Ratis scalability.
* I/O amplification on WAL retention for backup and replication
("logstream export")
* Ensure that LogStreams can be exported to a dist-filesystem in a manner
which requires no additional metadata/handling (avoid more storage/mgmt
complexity)
* Ability to build krb5 authn into Ratis (really, gRPC)

I will continue the two immediate action items. I think the latter
concerns are some that will require fingers-on-keyboard -- I don't know
enough about runtime characteristics without seeing it for myself.

All this said, I'd like to start moving toward the point where we start
breaking out this work into a feature-branch off of master and start
building code. My hope is that this is amenable to everyone, with the
acknowledge that the Ratis work is considered "experimental" and not an
attempt to make all of HBase use Ratis-backed WALs.

Finally, I do *not* want this message to be interpreted as me squashing
anyone's concerns. My honest opinion is that discussion has died down, but
I will be the first to apologize if I have missed any outstanding concerns.
Please, please, please ping me if I am negligent.

Thanks once again for everyone's participation.

[1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb
SJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM

On 2018/07/13 20:15:45, Josh Elser  wrote: > Hi all,



A long time ago, I shared a document about a (I'll call it..) "vision"
where we make some steps towards decoupling HBase from HDFS in an effort to
make deploying HBase on Cloud IaaS providers a bit easier (operational
simplicity, effective use of common IaaS paradigms, etc).

https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb
SJwBHVxbO7ge5ORqbCk/edit?usp=sharing

A good ask from our Stack back then was: "[can you break down this
work]?" The original document was very high-level, and asking for some more
details make a lot of sense. Months later, I'd like to share that I've
updated the original document with some new content at the bottom (as well
as addressed some comments which went unanswered by me -- sorry!)

Based on a discussion I had earlier this week (and some discussions
during HBaseCon in California in June), I've tried to add a brief

Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-07-25 Thread Josh Elser
Thanks, Andrew. I was really upset that I was butting heads with you 
when I would have previously thought that I had a design which was in 
line with something you would have called "good".


I will wholly take the blame in not having an as-clear-as-possible 
design doc. I am way down in the weeds and didn't bring myself up for 
air before trying to write something consumable for everyone else.


Making a good API is my biggest goal for the HBase side, and my hope is 
that it will support this experiment, enable others who want to try out 
other systems, and simplify our existing WAL implementations.


Thanks for the reply.

On 7/25/18 3:50 PM, Andrew Purtell wrote:

My biggest take-away is that I complicated this document by tying it too  
closely

with "HBase on Cloud", treating the WAL+Ratis LogService as the  only/biggest
thing to figure out.

Understanding this now helps a lot to understand better the positions taken
from the doc. At first glance it read as an initially interesting document
that quickly went to a weird place where there was a preconceived solution
working backward toward a problem, engineering run in reverse. I think it's
perfectly fine if the Ratis podling and those associated with it want to
drive development and/or adoption by finding candidate use cases in other
ecosystem projects. As long as we have good interfaces which don't leak
internals, no breaking core changes, no hard dependencies on incubating
artifacts, and at least a potential path forward to alternate
implementations it's all good!

On Wed, Jul 25, 2018 at 11:55 AM Josh Elser  wrote:


Let me give an update on-list for everyone:

First and foremost, thank you very much to everyone who took the time to
read this, with an extra thanks to those who participated in discussion.
There were lots of great points raised. Some about things that were
unclear in the doc, and others shining light onto subjects I hadn't
considered yet.

My biggest take-away is that I complicated this document by tying it too
closely with "HBase on Cloud", treating the WAL+Ratis LogService as the
only/biggest thing to figure out. This was inaccurate and overly bold of
me: I apologize. I think this complicated discussion on a number of
points, and ate a good bit of some of your's time.

My goal was to present this as an important part of a transition to the
"cloud", giving justification to what WAL+Ratis helps HBase achieve. I
did not want this document to be a step-by-step guide to a perfect HBase
on Cloud design. I need to do a better job with this in the future; sorry.

That said, my feeling is that, on the whole, folks are in support of the
proposed changes/architecture described for the WAL+Ratis work (tl;dr
revisit WAL API, plug in current WAL implementation to any API
modification, build new Ratis-backed WAL impl). There were some concerns
which still need immediate action that I am aware of:

* Sync with Ram and Anoop re: in-memory WAL [1]
* Where is Ratis LogService metadata kept? How do we know what
LogStreams were being used/maintained by a RS? How does this tie into
recovery?

There are also long-term concerns which I don't think I have an answer
for yet (for either reasons out of my control or a lack of technical
understanding):

* Maturity of the Ratis community
* Required performance by HBase and the ability of the LogService to
provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging
down disks, ability to scale RAFT quorums).
* Continue with WAL-per-RS or move to WAL-per-Region? Related to perf,
dependent upon Ratis scalability.
* I/O amplification on WAL retention for backup and replication
("logstream export")
* Ensure that LogStreams can be exported to a dist-filesystem in a
manner which requires no additional metadata/handling (avoid more
storage/mgmt complexity)
* Ability to build krb5 authn into Ratis (really, gRPC)

I will continue the two immediate action items. I think the latter
concerns are some that will require fingers-on-keyboard -- I don't know
enough about runtime characteristics without seeing it for myself.

All this said, I'd like to start moving toward the point where we start
breaking out this work into a feature-branch off of master and start
building code. My hope is that this is amenable to everyone, with the
acknowledge that the Ratis work is considered "experimental" and not an
attempt to make all of HBase use Ratis-backed WALs.

Finally, I do *not* want this message to be interpreted as me squashing
anyone's concerns. My honest opinion is that discussion has died down,
but I will be the first to apologize if I have missed any outstanding
concerns. Please, please, please ping me if I am negligent.

Thanks once again for everyone's participation.

[1]

https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM

On 2018/07/13 20:15:45, Josh Elser  wrote: > Hi all,


A long time ago, I shared a document about a (I'll call it..) "vision"
where we make 

Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-07-25 Thread Zach York
+1 to starting the work. I think most of the concerns can be figured out on
the JIRAs and we can have a project update every X weeks if enough people
are interested.

I also agree to frame the feature correctly. Decoupling from a HDFS WAL or
WAL on Ratis would be more appropriate names that would better convey the
scope. I think there are a number of projects necessary to complete "HBase
on Cloud" with this being one of those.


Thanks for driving this initiative!

Zach


On Wed, Jul 25, 2018 at 11:55 AM, Josh Elser  wrote:

> Let me give an update on-list for everyone:
>
> First and foremost, thank you very much to everyone who took the time to
> read this, with an extra thanks to those who participated in discussion.
> There were lots of great points raised. Some about things that were unclear
> in the doc, and others shining light onto subjects I hadn't considered yet.
>
> My biggest take-away is that I complicated this document by tying it too
> closely with "HBase on Cloud", treating the WAL+Ratis LogService as the
> only/biggest thing to figure out. This was inaccurate and overly bold of
> me: I apologize. I think this complicated discussion on a number of points,
> and ate a good bit of some of your's time.
>
> My goal was to present this as an important part of a transition to the
> "cloud", giving justification to what WAL+Ratis helps HBase achieve. I did
> not want this document to be a step-by-step guide to a perfect HBase on
> Cloud design. I need to do a better job with this in the future; sorry.
>
> That said, my feeling is that, on the whole, folks are in support of the
> proposed changes/architecture described for the WAL+Ratis work (tl;dr
> revisit WAL API, plug in current WAL implementation to any API
> modification, build new Ratis-backed WAL impl). There were some concerns
> which still need immediate action that I am aware of:
>
> * Sync with Ram and Anoop re: in-memory WAL [1]
> * Where is Ratis LogService metadata kept? How do we know what LogStreams
> were being used/maintained by a RS? How does this tie into recovery?
>
> There are also long-term concerns which I don't think I have an answer for
> yet (for either reasons out of my control or a lack of technical
> understanding):
>
> * Maturity of the Ratis community
> * Required performance by HBase and the ability of the LogService to
> provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging down
> disks, ability to scale RAFT quorums).
> * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf,
> dependent upon Ratis scalability.
> * I/O amplification on WAL retention for backup and replication
> ("logstream export")
> * Ensure that LogStreams can be exported to a dist-filesystem in a manner
> which requires no additional metadata/handling (avoid more storage/mgmt
> complexity)
> * Ability to build krb5 authn into Ratis (really, gRPC)
>
> I will continue the two immediate action items. I think the latter
> concerns are some that will require fingers-on-keyboard -- I don't know
> enough about runtime characteristics without seeing it for myself.
>
> All this said, I'd like to start moving toward the point where we start
> breaking out this work into a feature-branch off of master and start
> building code. My hope is that this is amenable to everyone, with the
> acknowledge that the Ratis work is considered "experimental" and not an
> attempt to make all of HBase use Ratis-backed WALs.
>
> Finally, I do *not* want this message to be interpreted as me squashing
> anyone's concerns. My honest opinion is that discussion has died down, but
> I will be the first to apologize if I have missed any outstanding concerns.
> Please, please, please ping me if I am negligent.
>
> Thanks once again for everyone's participation.
>
> [1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb
> SJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM
>
> On 2018/07/13 20:15:45, Josh Elser  wrote: > Hi all,
>
>>
>> A long time ago, I shared a document about a (I'll call it..) "vision"
>> where we make some steps towards decoupling HBase from HDFS in an effort to
>> make deploying HBase on Cloud IaaS providers a bit easier (operational
>> simplicity, effective use of common IaaS paradigms, etc).
>>
>> https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb
>> SJwBHVxbO7ge5ORqbCk/edit?usp=sharing
>>
>> A good ask from our Stack back then was: "[can you break down this
>> work]?" The original document was very high-level, and asking for some more
>> details make a lot of sense. Months later, I'd like to share that I've
>> updated the original document with some new content at the bottom (as well
>> as addressed some comments which went unanswered by me -- sorry!)
>>
>> Based on a discussion I had earlier this week (and some discussions
>> during HBaseCon in California in June), I've tried to add a brief
>> "refresher" on what some of the big goals for this effort are. Please check
>> it out at your leisure and let me know what 

Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-07-25 Thread Andrew Purtell
> My biggest take-away is that I complicated this document by tying it too  
> closely
with "HBase on Cloud", treating the WAL+Ratis LogService as the  only/biggest
thing to figure out.

Understanding this now helps a lot to understand better the positions taken
from the doc. At first glance it read as an initially interesting document
that quickly went to a weird place where there was a preconceived solution
working backward toward a problem, engineering run in reverse. I think it's
perfectly fine if the Ratis podling and those associated with it want to
drive development and/or adoption by finding candidate use cases in other
ecosystem projects. As long as we have good interfaces which don't leak
internals, no breaking core changes, no hard dependencies on incubating
artifacts, and at least a potential path forward to alternate
implementations it's all good!

On Wed, Jul 25, 2018 at 11:55 AM Josh Elser  wrote:

> Let me give an update on-list for everyone:
>
> First and foremost, thank you very much to everyone who took the time to
> read this, with an extra thanks to those who participated in discussion.
> There were lots of great points raised. Some about things that were
> unclear in the doc, and others shining light onto subjects I hadn't
> considered yet.
>
> My biggest take-away is that I complicated this document by tying it too
> closely with "HBase on Cloud", treating the WAL+Ratis LogService as the
> only/biggest thing to figure out. This was inaccurate and overly bold of
> me: I apologize. I think this complicated discussion on a number of
> points, and ate a good bit of some of your's time.
>
> My goal was to present this as an important part of a transition to the
> "cloud", giving justification to what WAL+Ratis helps HBase achieve. I
> did not want this document to be a step-by-step guide to a perfect HBase
> on Cloud design. I need to do a better job with this in the future; sorry.
>
> That said, my feeling is that, on the whole, folks are in support of the
> proposed changes/architecture described for the WAL+Ratis work (tl;dr
> revisit WAL API, plug in current WAL implementation to any API
> modification, build new Ratis-backed WAL impl). There were some concerns
> which still need immediate action that I am aware of:
>
> * Sync with Ram and Anoop re: in-memory WAL [1]
> * Where is Ratis LogService metadata kept? How do we know what
> LogStreams were being used/maintained by a RS? How does this tie into
> recovery?
>
> There are also long-term concerns which I don't think I have an answer
> for yet (for either reasons out of my control or a lack of technical
> understanding):
>
> * Maturity of the Ratis community
> * Required performance by HBase and the ability of the LogService to
> provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging
> down disks, ability to scale RAFT quorums).
> * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf,
> dependent upon Ratis scalability.
> * I/O amplification on WAL retention for backup and replication
> ("logstream export")
> * Ensure that LogStreams can be exported to a dist-filesystem in a
> manner which requires no additional metadata/handling (avoid more
> storage/mgmt complexity)
> * Ability to build krb5 authn into Ratis (really, gRPC)
>
> I will continue the two immediate action items. I think the latter
> concerns are some that will require fingers-on-keyboard -- I don't know
> enough about runtime characteristics without seeing it for myself.
>
> All this said, I'd like to start moving toward the point where we start
> breaking out this work into a feature-branch off of master and start
> building code. My hope is that this is amenable to everyone, with the
> acknowledge that the Ratis work is considered "experimental" and not an
> attempt to make all of HBase use Ratis-backed WALs.
>
> Finally, I do *not* want this message to be interpreted as me squashing
> anyone's concerns. My honest opinion is that discussion has died down,
> but I will be the first to apologize if I have missed any outstanding
> concerns. Please, please, please ping me if I am negligent.
>
> Thanks once again for everyone's participation.
>
> [1]
>
> https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM
>
> On 2018/07/13 20:15:45, Josh Elser  wrote: > Hi all,
> >
> > A long time ago, I shared a document about a (I'll call it..) "vision"
> > where we make some steps towards decoupling HBase from HDFS in an effort
> > to make deploying HBase on Cloud IaaS providers a bit easier
> > (operational simplicity, effective use of common IaaS paradigms, etc).
> >
> >
> https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?usp=sharing
> >
> > A good ask from our Stack back then was: "[can you break down this
> > work]?" The original document was very high-level, and asking for some
> > more details make a lot of sense. Months later, I'd like to share that
> > I've 

Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-07-25 Thread Josh Elser

Let me give an update on-list for everyone:

First and foremost, thank you very much to everyone who took the time to 
read this, with an extra thanks to those who participated in discussion. 
There were lots of great points raised. Some about things that were 
unclear in the doc, and others shining light onto subjects I hadn't 
considered yet.


My biggest take-away is that I complicated this document by tying it too 
closely with "HBase on Cloud", treating the WAL+Ratis LogService as the 
only/biggest thing to figure out. This was inaccurate and overly bold of 
me: I apologize. I think this complicated discussion on a number of 
points, and ate a good bit of some of your's time.


My goal was to present this as an important part of a transition to the 
"cloud", giving justification to what WAL+Ratis helps HBase achieve. I 
did not want this document to be a step-by-step guide to a perfect HBase 
on Cloud design. I need to do a better job with this in the future; sorry.


That said, my feeling is that, on the whole, folks are in support of the 
proposed changes/architecture described for the WAL+Ratis work (tl;dr 
revisit WAL API, plug in current WAL implementation to any API 
modification, build new Ratis-backed WAL impl). There were some concerns 
which still need immediate action that I am aware of:


* Sync with Ram and Anoop re: in-memory WAL [1]
* Where is Ratis LogService metadata kept? How do we know what 
LogStreams were being used/maintained by a RS? How does this tie into 
recovery?


There are also long-term concerns which I don't think I have an answer 
for yet (for either reasons out of my control or a lack of technical 
understanding):


* Maturity of the Ratis community
* Required performance by HBase and the ability of the LogService to 
provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging 
down disks, ability to scale RAFT quorums).
* Continue with WAL-per-RS or move to WAL-per-Region? Related to perf, 
dependent upon Ratis scalability.
* I/O amplification on WAL retention for backup and replication 
("logstream export")
* Ensure that LogStreams can be exported to a dist-filesystem in a 
manner which requires no additional metadata/handling (avoid more 
storage/mgmt complexity)

* Ability to build krb5 authn into Ratis (really, gRPC)

I will continue the two immediate action items. I think the latter 
concerns are some that will require fingers-on-keyboard -- I don't know 
enough about runtime characteristics without seeing it for myself.


All this said, I'd like to start moving toward the point where we start 
breaking out this work into a feature-branch off of master and start 
building code. My hope is that this is amenable to everyone, with the 
acknowledge that the Ratis work is considered "experimental" and not an 
attempt to make all of HBase use Ratis-backed WALs.


Finally, I do *not* want this message to be interpreted as me squashing 
anyone's concerns. My honest opinion is that discussion has died down, 
but I will be the first to apologize if I have missed any outstanding 
concerns. Please, please, please ping me if I am negligent.


Thanks once again for everyone's participation.

[1] 
https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM


On 2018/07/13 20:15:45, Josh Elser  wrote: > Hi all,


A long time ago, I shared a document about a (I'll call it..) "vision" 
where we make some steps towards decoupling HBase from HDFS in an effort 
to make deploying HBase on Cloud IaaS providers a bit easier 
(operational simplicity, effective use of common IaaS paradigms, etc).


https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?usp=sharing

A good ask from our Stack back then was: "[can you break down this 
work]?" The original document was very high-level, and asking for some 
more details make a lot of sense. Months later, I'd like to share that 
I've updated the original document with some new content at the bottom 
(as well as addressed some comments which went unanswered by me -- sorry!)


Based on a discussion I had earlier this week (and some discussions 
during HBaseCon in California in June), I've tried to add a brief 
"refresher" on what some of the big goals for this effort are. Please 
check it out at your leisure and let me know what you think. Would like 
to start getting some fingers behind this all and pump out some code :)


https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#bookmark=id.fml9ynrqagk

- Josh



[DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc

2018-07-13 Thread Josh Elser

Hi all,

A long time ago, I shared a document about a (I'll call it..) "vision" 
where we make some steps towards decoupling HBase from HDFS in an effort 
to make deploying HBase on Cloud IaaS providers a bit easier 
(operational simplicity, effective use of common IaaS paradigms, etc).


https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?usp=sharing

A good ask from our Stack back then was: "[can you break down this 
work]?" The original document was very high-level, and asking for some 
more details make a lot of sense. Months later, I'd like to share that 
I've updated the original document with some new content at the bottom 
(as well as addressed some comments which went unanswered by me -- sorry!)


Based on a discussion I had earlier this week (and some discussions 
during HBaseCon in California in June), I've tried to add a brief 
"refresher" on what some of the big goals for this effort are. Please 
check it out at your leisure and let me know what you think. Would like 
to start getting some fingers behind this all and pump out some code :)


https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#bookmark=id.fml9ynrqagk

- Josh