Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
Hi Josh, I’m interested in this, too. I’ll start to read and follow what you suggested to Jack. Thanks, Toshi > On Aug 7, 2018, at 09:12, Jack Bearden wrote: > > Thanks Josh that was helpful. I'll start doing some of my own research > around these topics and look into that Ratis ticket. Much appreciated! > > On Mon, Aug 6, 2018 at 8:04 AM, Josh Elser wrote: > >> Yup, replication is a big one to "unravel". Repeating myself from a branch >> in the thread, but I'd expect some initial suggestions on what a new API >> could be this week. Certainly the first draft won't be the final -- would >> be great to get your input after your AsyncWAL work, Duo. >> >> Using AWS SimpleQueryService, or much anything else, would be great. I >> want to make sure that, while we try to "scratch this one itch", we pave >> the way for whatever else folks want to experiment with. >> >> >> On 8/4/18 5:10 AM, 张铎(Duo Zhang) wrote: >> >>> Yes, maybe we could write WAL to SQS and HFile to S3, then we can deploy >>> HBase on AWS without any local storage volumes... >>> >>> But we also need a good abstraction for Replication, as the current design >>> is file based... >>> >>> 2018-07-27 1:28 GMT+08:00 Zach York : >>> >>> I would REALLY hope that the WAL interface/API changes would go into master even if the feature work for Ratis is going in a feature branch. Not only would this enable other backends to be developed in parallel with the Ratis solution if there are other good fits for a non-HDFS WAL, but also it would save the burden of having to rebase these core changes onto the latest master to maintain compatibility. I'm assuming the Ratis portion of the code would be mostly new files so these would be less of a concern. On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser wrote: On 7/26/18 1:00 AM, Stack wrote: > > All this said, I'd like to start moving toward the point where we start >> >>> breaking out this work into a feature-branch off of master and start >>> building code. My hope is that this is amenable to everyone, with the >>> acknowledge that the Ratis work is considered "experimental" and not >>> an >>> attempt to make all of HBase use Ratis-backed WALs. >>> >>> >>> Go for it. >>> >> >> The branch would have WAL API changes only or would it include Ratis >> WAL >> dev? (If the latter, would that be better done over on Ratis project?). >> >> > I think we would start with WAL API changes, get those "blessed", and > then > continue Ratis WAL dev after that. > > >>>
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
Thanks Josh that was helpful. I'll start doing some of my own research around these topics and look into that Ratis ticket. Much appreciated! On Mon, Aug 6, 2018 at 8:04 AM, Josh Elser wrote: > Yup, replication is a big one to "unravel". Repeating myself from a branch > in the thread, but I'd expect some initial suggestions on what a new API > could be this week. Certainly the first draft won't be the final -- would > be great to get your input after your AsyncWAL work, Duo. > > Using AWS SimpleQueryService, or much anything else, would be great. I > want to make sure that, while we try to "scratch this one itch", we pave > the way for whatever else folks want to experiment with. > > > On 8/4/18 5:10 AM, 张铎(Duo Zhang) wrote: > >> Yes, maybe we could write WAL to SQS and HFile to S3, then we can deploy >> HBase on AWS without any local storage volumes... >> >> But we also need a good abstraction for Replication, as the current design >> is file based... >> >> 2018-07-27 1:28 GMT+08:00 Zach York : >> >> I would REALLY hope that the WAL interface/API changes would go into >>> master >>> even if the feature work for Ratis is going in a feature branch. Not only >>> would this enable other backends to be developed in parallel with the >>> Ratis >>> solution if there are other good fits for a non-HDFS WAL, but also it >>> would >>> save the burden of having to rebase these core changes onto the latest >>> master to maintain compatibility. I'm assuming the Ratis portion of the >>> code would be mostly new files so these would be less of a concern. >>> >>> On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser wrote: >>> >>> On 7/26/18 1:00 AM, Stack wrote: All this said, I'd like to start moving toward the point where we start > >> breaking out this work into a feature-branch off of master and start >> building code. My hope is that this is amenable to everyone, with the >> acknowledge that the Ratis work is considered "experimental" and not >> an >> attempt to make all of HBase use Ratis-backed WALs. >> >> >> Go for it. >> > > The branch would have WAL API changes only or would it include Ratis > WAL > dev? (If the latter, would that be better done over on Ratis project?). > > I think we would start with WAL API changes, get those "blessed", and >>> then >>> continue Ratis WAL dev after that. >>> >>
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
Yup, replication is a big one to "unravel". Repeating myself from a branch in the thread, but I'd expect some initial suggestions on what a new API could be this week. Certainly the first draft won't be the final -- would be great to get your input after your AsyncWAL work, Duo. Using AWS SimpleQueryService, or much anything else, would be great. I want to make sure that, while we try to "scratch this one itch", we pave the way for whatever else folks want to experiment with. On 8/4/18 5:10 AM, 张铎(Duo Zhang) wrote: Yes, maybe we could write WAL to SQS and HFile to S3, then we can deploy HBase on AWS without any local storage volumes... But we also need a good abstraction for Replication, as the current design is file based... 2018-07-27 1:28 GMT+08:00 Zach York : I would REALLY hope that the WAL interface/API changes would go into master even if the feature work for Ratis is going in a feature branch. Not only would this enable other backends to be developed in parallel with the Ratis solution if there are other good fits for a non-HDFS WAL, but also it would save the burden of having to rebase these core changes onto the latest master to maintain compatibility. I'm assuming the Ratis portion of the code would be mostly new files so these would be less of a concern. On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser wrote: On 7/26/18 1:00 AM, Stack wrote: All this said, I'd like to start moving toward the point where we start breaking out this work into a feature-branch off of master and start building code. My hope is that this is amenable to everyone, with the acknowledge that the Ratis work is considered "experimental" and not an attempt to make all of HBase use Ratis-backed WALs. Go for it. The branch would have WAL API changes only or would it include Ratis WAL dev? (If the latter, would that be better done over on Ratis project?). I think we would start with WAL API changes, get those "blessed", and then continue Ratis WAL dev after that.
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
Thanks for your interest, Jack. Taking a read through the "design doc" and related discussion is a good starting point. Re-linked here for your convenience [1] https://issues.apache.org/jira/browse/HBASE-20951 has a break-down of work items. The first thing we need to work on is the WAL API. I know that our Ted and Ankit have been digging through the current codebase to get a starting point. I'd expect some initial "pseudocode" this week, but this will be a long pole to get some agreed upon, I imagine. You can also look at https://issues.apache.org/jira/browse/RATIS-271 which will likely be quicker to have code getting committed. A few of us have been playing around with the code there already, and doing a similar API "deep-dive" exercise. And, if it doesn't go without saying, feel free to ask questions on the mailing list here or in Slack. I'll do my best to answer them in a timely fashion :) - Josh [1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM On 8/3/18 2:42 PM, Jack Bearden wrote: Great work! I'm excited about this feature and want to help with development. What do you guys suggest are the best topics to ramp up in preparation for these upcoming sprints? On Fri, Aug 3, 2018 at 10:18 AM, Josh Elser wrote: Yup, we're on the same page :) On 7/26/18 1:28 PM, Zach York wrote: I would REALLY hope that the WAL interface/API changes would go into master even if the feature work for Ratis is going in a feature branch. Not only would this enable other backends to be developed in parallel with the Ratis solution if there are other good fits for a non-HDFS WAL, but also it would save the burden of having to rebase these core changes onto the latest master to maintain compatibility. I'm assuming the Ratis portion of the code would be mostly new files so these would be less of a concern. On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser wrote: On 7/26/18 1:00 AM, Stack wrote: All this said, I'd like to start moving toward the point where we start breaking out this work into a feature-branch off of master and start building code. My hope is that this is amenable to everyone, with the acknowledge that the Ratis work is considered "experimental" and not an attempt to make all of HBase use Ratis-backed WALs. Go for it. The branch would have WAL API changes only or would it include Ratis WAL dev? (If the latter, would that be better done over on Ratis project?). I think we would start with WAL API changes, get those "blessed", and then continue Ratis WAL dev after that.
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
Yes, maybe we could write WAL to SQS and HFile to S3, then we can deploy HBase on AWS without any local storage volumes... But we also need a good abstraction for Replication, as the current design is file based... 2018-07-27 1:28 GMT+08:00 Zach York : > I would REALLY hope that the WAL interface/API changes would go into master > even if the feature work for Ratis is going in a feature branch. Not only > would this enable other backends to be developed in parallel with the Ratis > solution if there are other good fits for a non-HDFS WAL, but also it would > save the burden of having to rebase these core changes onto the latest > master to maintain compatibility. I'm assuming the Ratis portion of the > code would be mostly new files so these would be less of a concern. > > On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser wrote: > > > On 7/26/18 1:00 AM, Stack wrote: > > > >> All this said, I'd like to start moving toward the point where we start > >>> breaking out this work into a feature-branch off of master and start > >>> building code. My hope is that this is amenable to everyone, with the > >>> acknowledge that the Ratis work is considered "experimental" and not an > >>> attempt to make all of HBase use Ratis-backed WALs. > >>> > >>> > >>> Go for it. > >> > >> The branch would have WAL API changes only or would it include Ratis WAL > >> dev? (If the latter, would that be better done over on Ratis project?). > >> > > > > I think we would start with WAL API changes, get those "blessed", and > then > > continue Ratis WAL dev after that. > > >
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
Great work! I'm excited about this feature and want to help with development. What do you guys suggest are the best topics to ramp up in preparation for these upcoming sprints? On Fri, Aug 3, 2018 at 10:18 AM, Josh Elser wrote: > Yup, we're on the same page :) > > > On 7/26/18 1:28 PM, Zach York wrote: > >> I would REALLY hope that the WAL interface/API changes would go into >> master >> even if the feature work for Ratis is going in a feature branch. Not only >> would this enable other backends to be developed in parallel with the >> Ratis >> solution if there are other good fits for a non-HDFS WAL, but also it >> would >> save the burden of having to rebase these core changes onto the latest >> master to maintain compatibility. I'm assuming the Ratis portion of the >> code would be mostly new files so these would be less of a concern. >> >> On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser wrote: >> >> On 7/26/18 1:00 AM, Stack wrote: >>> >>> All this said, I'd like to start moving toward the point where we start > breaking out this work into a feature-branch off of master and start > building code. My hope is that this is amenable to everyone, with the > acknowledge that the Ratis work is considered "experimental" and not an > attempt to make all of HBase use Ratis-backed WALs. > > > Go for it. > The branch would have WAL API changes only or would it include Ratis WAL dev? (If the latter, would that be better done over on Ratis project?). >>> I think we would start with WAL API changes, get those "blessed", and >>> then >>> continue Ratis WAL dev after that. >>> >>> >>
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
Yup, we're on the same page :) On 7/26/18 1:28 PM, Zach York wrote: I would REALLY hope that the WAL interface/API changes would go into master even if the feature work for Ratis is going in a feature branch. Not only would this enable other backends to be developed in parallel with the Ratis solution if there are other good fits for a non-HDFS WAL, but also it would save the burden of having to rebase these core changes onto the latest master to maintain compatibility. I'm assuming the Ratis portion of the code would be mostly new files so these would be less of a concern. On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser wrote: On 7/26/18 1:00 AM, Stack wrote: All this said, I'd like to start moving toward the point where we start breaking out this work into a feature-branch off of master and start building code. My hope is that this is amenable to everyone, with the acknowledge that the Ratis work is considered "experimental" and not an attempt to make all of HBase use Ratis-backed WALs. Go for it. The branch would have WAL API changes only or would it include Ratis WAL dev? (If the latter, would that be better done over on Ratis project?). I think we would start with WAL API changes, get those "blessed", and then continue Ratis WAL dev after that.
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
I would REALLY hope that the WAL interface/API changes would go into master even if the feature work for Ratis is going in a feature branch. Not only would this enable other backends to be developed in parallel with the Ratis solution if there are other good fits for a non-HDFS WAL, but also it would save the burden of having to rebase these core changes onto the latest master to maintain compatibility. I'm assuming the Ratis portion of the code would be mostly new files so these would be less of a concern. On Thu, Jul 26, 2018 at 9:24 AM, Josh Elser wrote: > On 7/26/18 1:00 AM, Stack wrote: > >> All this said, I'd like to start moving toward the point where we start >>> breaking out this work into a feature-branch off of master and start >>> building code. My hope is that this is amenable to everyone, with the >>> acknowledge that the Ratis work is considered "experimental" and not an >>> attempt to make all of HBase use Ratis-backed WALs. >>> >>> >>> Go for it. >> >> The branch would have WAL API changes only or would it include Ratis WAL >> dev? (If the latter, would that be better done over on Ratis project?). >> > > I think we would start with WAL API changes, get those "blessed", and then > continue Ratis WAL dev after that. >
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
On 7/26/18 1:00 AM, Stack wrote: All this said, I'd like to start moving toward the point where we start breaking out this work into a feature-branch off of master and start building code. My hope is that this is amenable to everyone, with the acknowledge that the Ratis work is considered "experimental" and not an attempt to make all of HBase use Ratis-backed WALs. Go for it. The branch would have WAL API changes only or would it include Ratis WAL dev? (If the latter, would that be better done over on Ratis project?). I think we would start with WAL API changes, get those "blessed", and then continue Ratis WAL dev after that.
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
On Wed, Jul 25, 2018 at 11:55 AM Josh Elser wrote: > ... > My biggest take-away is that I complicated this document by tying it too > closely with "HBase on Cloud", treating the WAL+Ratis LogService as the > only/biggest thing to figure out. This was inaccurate and overly bold of > me: I apologize. I think this complicated discussion on a number of > points, and ate a good bit of some of your's time. > > No need of apology. There was healthy back and forth. You read the feedback and took it on board. (See below). > My goal was to present this as an important part of a transition to the > "cloud", giving justification to what WAL+Ratis helps HBase achieve. I > did not want this document to be a step-by-step guide to a perfect HBase > on Cloud design. I need to do a better job with this in the future; sorry. That said, my feeling is that, on the whole, folks are in support of the > proposed changes/architecture described for the WAL+Ratis work (tl;dr > revisit WAL API, plug in current WAL implementation to any API > modification, build new Ratis-backed WAL impl). There were some concerns > which still need immediate action that I am aware of: > > * Sync with Ram and Anoop re: in-memory WAL [1] > * Where is Ratis LogService metadata kept? How do we know what > LogStreams were being used/maintained by a RS? How does this tie into > recovery? > > There are also long-term concerns which I don't think I have an answer > for yet (for either reasons out of my control or a lack of technical > understanding): > > * Maturity of the Ratis community > * Required performance by HBase and the ability of the LogService to > provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging > down disks, ability to scale RAFT quorums). > * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf, > dependent upon Ratis scalability. > * I/O amplification on WAL retention for backup and replication > ("logstream export") > * Ensure that LogStreams can be exported to a dist-filesystem in a > manner which requires no additional metadata/handling (avoid more > storage/mgmt complexity) > * Ability to build krb5 authn into Ratis (really, gRPC) > > I will continue the two immediate action items. I think the latter > concerns are some that will require fingers-on-keyboard -- I don't know > enough about runtime characteristics without seeing it for myself. > > All this said, I'd like to start moving toward the point where we start > breaking out this work into a feature-branch off of master and start > building code. My hope is that this is amenable to everyone, with the > acknowledge that the Ratis work is considered "experimental" and not an > attempt to make all of HBase use Ratis-backed WALs. > > Go for it. The branch would have WAL API changes only or would it include Ratis WAL dev? (If the latter, would that be better done over on Ratis project?). S > Finally, I do *not* want this message to be interpreted as me squashing > anyone's concerns. My honest opinion is that discussion has died down, > but I will be the first to apologize if I have missed any outstanding > concerns. Please, please, please ping me if I am negligent. > > Thanks once again for everyone's participation. > > [1] > > https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM > > On 2018/07/13 20:15:45, Josh Elser wrote: > Hi all, > > > > A long time ago, I shared a document about a (I'll call it..) "vision" > > where we make some steps towards decoupling HBase from HDFS in an effort > > to make deploying HBase on Cloud IaaS providers a bit easier > > (operational simplicity, effective use of common IaaS paradigms, etc). > > > > > https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?usp=sharing > > > > A good ask from our Stack back then was: "[can you break down this > > work]?" The original document was very high-level, and asking for some > > more details make a lot of sense. Months later, I'd like to share that > > I've updated the original document with some new content at the bottom > > (as well as addressed some comments which went unanswered by me -- > sorry!) > > > > Based on a discussion I had earlier this week (and some discussions > > during HBaseCon in California in June), I've tried to add a brief > > "refresher" on what some of the big goals for this effort are. Please > > check it out at your leisure and let me know what you think. Would like > > to start getting some fingers behind this all and pump out some code :) > > > > > https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#bookmark=id.fml9ynrqagk > > > > - Josh > > >
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
Thanks, Zach! I like your suggestion about project updates. I sincerely hope that this can be something transparent enough that folks who want to follow-on and participate in implementation can do so. Let me think about how to drive this better. On 7/25/18 3:55 PM, Zach York wrote: +1 to starting the work. I think most of the concerns can be figured out on the JIRAs and we can have a project update every X weeks if enough people are interested. I also agree to frame the feature correctly. Decoupling from a HDFS WAL or WAL on Ratis would be more appropriate names that would better convey the scope. I think there are a number of projects necessary to complete "HBase on Cloud" with this being one of those. Thanks for driving this initiative! Zach On Wed, Jul 25, 2018 at 11:55 AM, Josh Elser wrote: Let me give an update on-list for everyone: First and foremost, thank you very much to everyone who took the time to read this, with an extra thanks to those who participated in discussion. There were lots of great points raised. Some about things that were unclear in the doc, and others shining light onto subjects I hadn't considered yet. My biggest take-away is that I complicated this document by tying it too closely with "HBase on Cloud", treating the WAL+Ratis LogService as the only/biggest thing to figure out. This was inaccurate and overly bold of me: I apologize. I think this complicated discussion on a number of points, and ate a good bit of some of your's time. My goal was to present this as an important part of a transition to the "cloud", giving justification to what WAL+Ratis helps HBase achieve. I did not want this document to be a step-by-step guide to a perfect HBase on Cloud design. I need to do a better job with this in the future; sorry. That said, my feeling is that, on the whole, folks are in support of the proposed changes/architecture described for the WAL+Ratis work (tl;dr revisit WAL API, plug in current WAL implementation to any API modification, build new Ratis-backed WAL impl). There were some concerns which still need immediate action that I am aware of: * Sync with Ram and Anoop re: in-memory WAL [1] * Where is Ratis LogService metadata kept? How do we know what LogStreams were being used/maintained by a RS? How does this tie into recovery? There are also long-term concerns which I don't think I have an answer for yet (for either reasons out of my control or a lack of technical understanding): * Maturity of the Ratis community * Required performance by HBase and the ability of the LogService to provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging down disks, ability to scale RAFT quorums). * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf, dependent upon Ratis scalability. * I/O amplification on WAL retention for backup and replication ("logstream export") * Ensure that LogStreams can be exported to a dist-filesystem in a manner which requires no additional metadata/handling (avoid more storage/mgmt complexity) * Ability to build krb5 authn into Ratis (really, gRPC) I will continue the two immediate action items. I think the latter concerns are some that will require fingers-on-keyboard -- I don't know enough about runtime characteristics without seeing it for myself. All this said, I'd like to start moving toward the point where we start breaking out this work into a feature-branch off of master and start building code. My hope is that this is amenable to everyone, with the acknowledge that the Ratis work is considered "experimental" and not an attempt to make all of HBase use Ratis-backed WALs. Finally, I do *not* want this message to be interpreted as me squashing anyone's concerns. My honest opinion is that discussion has died down, but I will be the first to apologize if I have missed any outstanding concerns. Please, please, please ping me if I am negligent. Thanks once again for everyone's participation. [1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb SJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM On 2018/07/13 20:15:45, Josh Elser wrote: > Hi all, A long time ago, I shared a document about a (I'll call it..) "vision" where we make some steps towards decoupling HBase from HDFS in an effort to make deploying HBase on Cloud IaaS providers a bit easier (operational simplicity, effective use of common IaaS paradigms, etc). https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb SJwBHVxbO7ge5ORqbCk/edit?usp=sharing A good ask from our Stack back then was: "[can you break down this work]?" The original document was very high-level, and asking for some more details make a lot of sense. Months later, I'd like to share that I've updated the original document with some new content at the bottom (as well as addressed some comments which went unanswered by me -- sorry!) Based on a discussion I had earlier this week (and some discussions during HBaseCon in California in June), I've tried to add a brief
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
Thanks, Andrew. I was really upset that I was butting heads with you when I would have previously thought that I had a design which was in line with something you would have called "good". I will wholly take the blame in not having an as-clear-as-possible design doc. I am way down in the weeds and didn't bring myself up for air before trying to write something consumable for everyone else. Making a good API is my biggest goal for the HBase side, and my hope is that it will support this experiment, enable others who want to try out other systems, and simplify our existing WAL implementations. Thanks for the reply. On 7/25/18 3:50 PM, Andrew Purtell wrote: My biggest take-away is that I complicated this document by tying it too closely with "HBase on Cloud", treating the WAL+Ratis LogService as the only/biggest thing to figure out. Understanding this now helps a lot to understand better the positions taken from the doc. At first glance it read as an initially interesting document that quickly went to a weird place where there was a preconceived solution working backward toward a problem, engineering run in reverse. I think it's perfectly fine if the Ratis podling and those associated with it want to drive development and/or adoption by finding candidate use cases in other ecosystem projects. As long as we have good interfaces which don't leak internals, no breaking core changes, no hard dependencies on incubating artifacts, and at least a potential path forward to alternate implementations it's all good! On Wed, Jul 25, 2018 at 11:55 AM Josh Elser wrote: Let me give an update on-list for everyone: First and foremost, thank you very much to everyone who took the time to read this, with an extra thanks to those who participated in discussion. There were lots of great points raised. Some about things that were unclear in the doc, and others shining light onto subjects I hadn't considered yet. My biggest take-away is that I complicated this document by tying it too closely with "HBase on Cloud", treating the WAL+Ratis LogService as the only/biggest thing to figure out. This was inaccurate and overly bold of me: I apologize. I think this complicated discussion on a number of points, and ate a good bit of some of your's time. My goal was to present this as an important part of a transition to the "cloud", giving justification to what WAL+Ratis helps HBase achieve. I did not want this document to be a step-by-step guide to a perfect HBase on Cloud design. I need to do a better job with this in the future; sorry. That said, my feeling is that, on the whole, folks are in support of the proposed changes/architecture described for the WAL+Ratis work (tl;dr revisit WAL API, plug in current WAL implementation to any API modification, build new Ratis-backed WAL impl). There were some concerns which still need immediate action that I am aware of: * Sync with Ram and Anoop re: in-memory WAL [1] * Where is Ratis LogService metadata kept? How do we know what LogStreams were being used/maintained by a RS? How does this tie into recovery? There are also long-term concerns which I don't think I have an answer for yet (for either reasons out of my control or a lack of technical understanding): * Maturity of the Ratis community * Required performance by HBase and the ability of the LogService to provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging down disks, ability to scale RAFT quorums). * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf, dependent upon Ratis scalability. * I/O amplification on WAL retention for backup and replication ("logstream export") * Ensure that LogStreams can be exported to a dist-filesystem in a manner which requires no additional metadata/handling (avoid more storage/mgmt complexity) * Ability to build krb5 authn into Ratis (really, gRPC) I will continue the two immediate action items. I think the latter concerns are some that will require fingers-on-keyboard -- I don't know enough about runtime characteristics without seeing it for myself. All this said, I'd like to start moving toward the point where we start breaking out this work into a feature-branch off of master and start building code. My hope is that this is amenable to everyone, with the acknowledge that the Ratis work is considered "experimental" and not an attempt to make all of HBase use Ratis-backed WALs. Finally, I do *not* want this message to be interpreted as me squashing anyone's concerns. My honest opinion is that discussion has died down, but I will be the first to apologize if I have missed any outstanding concerns. Please, please, please ping me if I am negligent. Thanks once again for everyone's participation. [1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM On 2018/07/13 20:15:45, Josh Elser wrote: > Hi all, A long time ago, I shared a document about a (I'll call it..) "vision" where we make
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
+1 to starting the work. I think most of the concerns can be figured out on the JIRAs and we can have a project update every X weeks if enough people are interested. I also agree to frame the feature correctly. Decoupling from a HDFS WAL or WAL on Ratis would be more appropriate names that would better convey the scope. I think there are a number of projects necessary to complete "HBase on Cloud" with this being one of those. Thanks for driving this initiative! Zach On Wed, Jul 25, 2018 at 11:55 AM, Josh Elser wrote: > Let me give an update on-list for everyone: > > First and foremost, thank you very much to everyone who took the time to > read this, with an extra thanks to those who participated in discussion. > There were lots of great points raised. Some about things that were unclear > in the doc, and others shining light onto subjects I hadn't considered yet. > > My biggest take-away is that I complicated this document by tying it too > closely with "HBase on Cloud", treating the WAL+Ratis LogService as the > only/biggest thing to figure out. This was inaccurate and overly bold of > me: I apologize. I think this complicated discussion on a number of points, > and ate a good bit of some of your's time. > > My goal was to present this as an important part of a transition to the > "cloud", giving justification to what WAL+Ratis helps HBase achieve. I did > not want this document to be a step-by-step guide to a perfect HBase on > Cloud design. I need to do a better job with this in the future; sorry. > > That said, my feeling is that, on the whole, folks are in support of the > proposed changes/architecture described for the WAL+Ratis work (tl;dr > revisit WAL API, plug in current WAL implementation to any API > modification, build new Ratis-backed WAL impl). There were some concerns > which still need immediate action that I am aware of: > > * Sync with Ram and Anoop re: in-memory WAL [1] > * Where is Ratis LogService metadata kept? How do we know what LogStreams > were being used/maintained by a RS? How does this tie into recovery? > > There are also long-term concerns which I don't think I have an answer for > yet (for either reasons out of my control or a lack of technical > understanding): > > * Maturity of the Ratis community > * Required performance by HBase and the ability of the LogService to > provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging down > disks, ability to scale RAFT quorums). > * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf, > dependent upon Ratis scalability. > * I/O amplification on WAL retention for backup and replication > ("logstream export") > * Ensure that LogStreams can be exported to a dist-filesystem in a manner > which requires no additional metadata/handling (avoid more storage/mgmt > complexity) > * Ability to build krb5 authn into Ratis (really, gRPC) > > I will continue the two immediate action items. I think the latter > concerns are some that will require fingers-on-keyboard -- I don't know > enough about runtime characteristics without seeing it for myself. > > All this said, I'd like to start moving toward the point where we start > breaking out this work into a feature-branch off of master and start > building code. My hope is that this is amenable to everyone, with the > acknowledge that the Ratis work is considered "experimental" and not an > attempt to make all of HBase use Ratis-backed WALs. > > Finally, I do *not* want this message to be interpreted as me squashing > anyone's concerns. My honest opinion is that discussion has died down, but > I will be the first to apologize if I have missed any outstanding concerns. > Please, please, please ping me if I am negligent. > > Thanks once again for everyone's participation. > > [1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb > SJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM > > On 2018/07/13 20:15:45, Josh Elser wrote: > Hi all, > >> >> A long time ago, I shared a document about a (I'll call it..) "vision" >> where we make some steps towards decoupling HBase from HDFS in an effort to >> make deploying HBase on Cloud IaaS providers a bit easier (operational >> simplicity, effective use of common IaaS paradigms, etc). >> >> https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb >> SJwBHVxbO7ge5ORqbCk/edit?usp=sharing >> >> A good ask from our Stack back then was: "[can you break down this >> work]?" The original document was very high-level, and asking for some more >> details make a lot of sense. Months later, I'd like to share that I've >> updated the original document with some new content at the bottom (as well >> as addressed some comments which went unanswered by me -- sorry!) >> >> Based on a discussion I had earlier this week (and some discussions >> during HBaseCon in California in June), I've tried to add a brief >> "refresher" on what some of the big goals for this effort are. Please check >> it out at your leisure and let me know what
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
> My biggest take-away is that I complicated this document by tying it too > closely with "HBase on Cloud", treating the WAL+Ratis LogService as the only/biggest thing to figure out. Understanding this now helps a lot to understand better the positions taken from the doc. At first glance it read as an initially interesting document that quickly went to a weird place where there was a preconceived solution working backward toward a problem, engineering run in reverse. I think it's perfectly fine if the Ratis podling and those associated with it want to drive development and/or adoption by finding candidate use cases in other ecosystem projects. As long as we have good interfaces which don't leak internals, no breaking core changes, no hard dependencies on incubating artifacts, and at least a potential path forward to alternate implementations it's all good! On Wed, Jul 25, 2018 at 11:55 AM Josh Elser wrote: > Let me give an update on-list for everyone: > > First and foremost, thank you very much to everyone who took the time to > read this, with an extra thanks to those who participated in discussion. > There were lots of great points raised. Some about things that were > unclear in the doc, and others shining light onto subjects I hadn't > considered yet. > > My biggest take-away is that I complicated this document by tying it too > closely with "HBase on Cloud", treating the WAL+Ratis LogService as the > only/biggest thing to figure out. This was inaccurate and overly bold of > me: I apologize. I think this complicated discussion on a number of > points, and ate a good bit of some of your's time. > > My goal was to present this as an important part of a transition to the > "cloud", giving justification to what WAL+Ratis helps HBase achieve. I > did not want this document to be a step-by-step guide to a perfect HBase > on Cloud design. I need to do a better job with this in the future; sorry. > > That said, my feeling is that, on the whole, folks are in support of the > proposed changes/architecture described for the WAL+Ratis work (tl;dr > revisit WAL API, plug in current WAL implementation to any API > modification, build new Ratis-backed WAL impl). There were some concerns > which still need immediate action that I am aware of: > > * Sync with Ram and Anoop re: in-memory WAL [1] > * Where is Ratis LogService metadata kept? How do we know what > LogStreams were being used/maintained by a RS? How does this tie into > recovery? > > There are also long-term concerns which I don't think I have an answer > for yet (for either reasons out of my control or a lack of technical > understanding): > > * Maturity of the Ratis community > * Required performance by HBase and the ability of the LogService to > provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging > down disks, ability to scale RAFT quorums). > * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf, > dependent upon Ratis scalability. > * I/O amplification on WAL retention for backup and replication > ("logstream export") > * Ensure that LogStreams can be exported to a dist-filesystem in a > manner which requires no additional metadata/handling (avoid more > storage/mgmt complexity) > * Ability to build krb5 authn into Ratis (really, gRPC) > > I will continue the two immediate action items. I think the latter > concerns are some that will require fingers-on-keyboard -- I don't know > enough about runtime characteristics without seeing it for myself. > > All this said, I'd like to start moving toward the point where we start > breaking out this work into a feature-branch off of master and start > building code. My hope is that this is amenable to everyone, with the > acknowledge that the Ratis work is considered "experimental" and not an > attempt to make all of HBase use Ratis-backed WALs. > > Finally, I do *not* want this message to be interpreted as me squashing > anyone's concerns. My honest opinion is that discussion has died down, > but I will be the first to apologize if I have missed any outstanding > concerns. Please, please, please ping me if I am negligent. > > Thanks once again for everyone's participation. > > [1] > > https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM > > On 2018/07/13 20:15:45, Josh Elser wrote: > Hi all, > > > > A long time ago, I shared a document about a (I'll call it..) "vision" > > where we make some steps towards decoupling HBase from HDFS in an effort > > to make deploying HBase on Cloud IaaS providers a bit easier > > (operational simplicity, effective use of common IaaS paradigms, etc). > > > > > https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?usp=sharing > > > > A good ask from our Stack back then was: "[can you break down this > > work]?" The original document was very high-level, and asking for some > > more details make a lot of sense. Months later, I'd like to share that > > I've
Re: [DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
Let me give an update on-list for everyone: First and foremost, thank you very much to everyone who took the time to read this, with an extra thanks to those who participated in discussion. There were lots of great points raised. Some about things that were unclear in the doc, and others shining light onto subjects I hadn't considered yet. My biggest take-away is that I complicated this document by tying it too closely with "HBase on Cloud", treating the WAL+Ratis LogService as the only/biggest thing to figure out. This was inaccurate and overly bold of me: I apologize. I think this complicated discussion on a number of points, and ate a good bit of some of your's time. My goal was to present this as an important part of a transition to the "cloud", giving justification to what WAL+Ratis helps HBase achieve. I did not want this document to be a step-by-step guide to a perfect HBase on Cloud design. I need to do a better job with this in the future; sorry. That said, my feeling is that, on the whole, folks are in support of the proposed changes/architecture described for the WAL+Ratis work (tl;dr revisit WAL API, plug in current WAL implementation to any API modification, build new Ratis-backed WAL impl). There were some concerns which still need immediate action that I am aware of: * Sync with Ram and Anoop re: in-memory WAL [1] * Where is Ratis LogService metadata kept? How do we know what LogStreams were being used/maintained by a RS? How does this tie into recovery? There are also long-term concerns which I don't think I have an answer for yet (for either reasons out of my control or a lack of technical understanding): * Maturity of the Ratis community * Required performance by HBase and the ability of the LogService to provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging down disks, ability to scale RAFT quorums). * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf, dependent upon Ratis scalability. * I/O amplification on WAL retention for backup and replication ("logstream export") * Ensure that LogStreams can be exported to a dist-filesystem in a manner which requires no additional metadata/handling (avoid more storage/mgmt complexity) * Ability to build krb5 authn into Ratis (really, gRPC) I will continue the two immediate action items. I think the latter concerns are some that will require fingers-on-keyboard -- I don't know enough about runtime characteristics without seeing it for myself. All this said, I'd like to start moving toward the point where we start breaking out this work into a feature-branch off of master and start building code. My hope is that this is amenable to everyone, with the acknowledge that the Ratis work is considered "experimental" and not an attempt to make all of HBase use Ratis-backed WALs. Finally, I do *not* want this message to be interpreted as me squashing anyone's concerns. My honest opinion is that discussion has died down, but I will be the first to apologize if I have missed any outstanding concerns. Please, please, please ping me if I am negligent. Thanks once again for everyone's participation. [1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=CBm3RLM On 2018/07/13 20:15:45, Josh Elser wrote: > Hi all, A long time ago, I shared a document about a (I'll call it..) "vision" where we make some steps towards decoupling HBase from HDFS in an effort to make deploying HBase on Cloud IaaS providers a bit easier (operational simplicity, effective use of common IaaS paradigms, etc). https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?usp=sharing A good ask from our Stack back then was: "[can you break down this work]?" The original document was very high-level, and asking for some more details make a lot of sense. Months later, I'd like to share that I've updated the original document with some new content at the bottom (as well as addressed some comments which went unanswered by me -- sorry!) Based on a discussion I had earlier this week (and some discussions during HBaseCon in California in June), I've tried to add a brief "refresher" on what some of the big goals for this effort are. Please check it out at your leisure and let me know what you think. Would like to start getting some fingers behind this all and pump out some code :) https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#bookmark=id.fml9ynrqagk - Josh
[DISCUSS] Expanded "work items" for HBase-in-the-Cloud doc
Hi all, A long time ago, I shared a document about a (I'll call it..) "vision" where we make some steps towards decoupling HBase from HDFS in an effort to make deploying HBase on Cloud IaaS providers a bit easier (operational simplicity, effective use of common IaaS paradigms, etc). https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?usp=sharing A good ask from our Stack back then was: "[can you break down this work]?" The original document was very high-level, and asking for some more details make a lot of sense. Months later, I'd like to share that I've updated the original document with some new content at the bottom (as well as addressed some comments which went unanswered by me -- sorry!) Based on a discussion I had earlier this week (and some discussions during HBaseCon in California in June), I've tried to add a brief "refresher" on what some of the big goals for this effort are. Please check it out at your leisure and let me know what you think. Would like to start getting some fingers behind this all and pump out some code :) https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#bookmark=id.fml9ynrqagk - Josh