[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244405#comment-17244405 ] Francis Christopher Liu edited comment on HBASE-11288 at 12/5/20, 5:38 AM: --- Good idea [~stack]. +1 Let's leave the baggage here so it'll be easier for the community to get involved. was (Author: toffer): Good idea [~stack]. +1 Let's leave the baggage here so it'll be easier for the community to get invovled. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > Attachments: jstack20200807_bad_rpc_priority.txt, root_priority.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175227#comment-17175227 ] Francis Christopher Liu edited comment on HBASE-11288 at 8/11/20, 4:59 AM: --- Apologies for the late reply. I got caught up dealing with the issues to get my setup running. Right now it's running and 4 of 5 iterations has finished. I'm using the command [~stack] shared. Here's a rundown of some of the things I dealt with so far. 1. Disabled netty native transport - regionservers JVM's would basically freeze. Couldn't even do jstack without "-F". I found out that internally we don't run with native transport for hadoop as it has a lot of issues apparently (wrong fds getting closed, etc). 2. Disabled security - there was some issue with TokenProvider throwing IllegalThreadStateException on startup causing the RS to abort 3. Hadoop-2.8 and zookeeper 3.4.x - made some quick changes so I could downgrade the dependencies as these are what we use internally. Shouldn't affect the validity of the test? 4. MonitorTaskImpl.prettyPrintJournal was throwing an NPE causing RS to abort - might be related to me using "filesystem" as the wal provider. I changed the code to swallow the exception for now as it seems unrelated. Will try and rerun the test use Async wal provider. 5. SCP bug introduced by patch - caused meta not to get assigned. Caught and fixed this quick and early. 6. [~stack] root executor patch - didn't run into this, but makes sense so I've applied the patch to the branch and have been running with it. was (Author: toffer): Apologies for the late reply. I got caught up dealing with the issues to get my setup running. Right now it's running and 4 of 5 iterations has finished. I'm using the command [~stack] shared. Here's a rundown of some of the things I dealt with so far. 1. Disabled netty native transport - regionservers JVM's would basically freeze. Couldn't even do jstack without "-F". I found out that internally we don't run with native transport for hadoop as it has a lot of issues apparently (wrong fds getting closed, etc). 2. Disabled security - there was some issue with TokenProvider throwing IllegalThreadStateException on startup causing the RS to abort 3. Hadoop-2.8 and zookeeper 3.4.x - made some quick changes so I could downgrade the dependencies as these are what we use internally. Shouldn't affect the validity of the test? 4. MonitorTaskImpl.prettyPrintJournal was throwing an NPE causing RS to abort - might be related to me using "filesystem" as the wal provider. I changed the code to swallow the exception for now as it seems unrelated. Will try and rerun the test use Async wal provider. 5. SCP bug introduced by patch - caused meta not to get assigned. Caught and fixed this quick and early. 6. [~stack] root executor patch - didn't run into this, but makes sense. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > Attachments: jstack20200807_bad_rpc_priority.txt, root_priority.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173524#comment-17173524 ] Michael Stack edited comment on HBASE-11288 at 8/8/20, 12:28 AM: - On latest ITBLL run, I ran into a deadlock, a combination of chaos monkey DDOS'ing the 'default' RPC handlers and then hbase:root failing to report successful open because its 'priority' was default rather than priority – it was locked out. Will attach jstack and patch to make it so hbase:root ops get priority (the patch is a hack...needs polish... adds a root executor and upping priority if root table) [^jstack20200807_bad_rpc_priority.txt] ^[^root_priority.patch]^ was (Author: stack): On latest ITBLL run, I ran into a deadlock, a combination of chaos monkey DDOS'ing the 'default' RPC handlers and then hbase:root failing to report successful open because its 'priority' was default rather than priority – it was locked out. Will attach jstack and patch to make it so hbase:root ops get priority (the patch is a hack...needs polish... adds a root executor and upping priority if root table) [^jstack20200807_bad_rpc_priority.txt] > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > Attachments: jstack20200807_bad_rpc_priority.txt, root_priority.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173524#comment-17173524 ] Michael Stack edited comment on HBASE-11288 at 8/8/20, 12:27 AM: - On latest ITBLL run, I ran into a deadlock, a combination of chaos monkey DDOS'ing the 'default' RPC handlers and then hbase:root failing to report successful open because its 'priority' was default rather than priority – it was locked out. Will attach jstack and patch to make it so hbase:root ops get priority (the patch is a hack...needs polish... adds a root executor and upping priority if root table) [^jstack20200807_bad_rpc_priority.txt] was (Author: stack): On latest ITBLL run, I ran into a deadlock, a combination of chaos monkey DDOS'ing the 'default' RPC handlers and then hbase:root failing to report successful open because its 'priority' was default rather than priority – it was locked out. Will attach jstack and patch to make it so hbase:root ops get priority. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > Attachments: jstack20200807_bad_rpc_priority.txt, root_priority.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157846#comment-17157846 ] Francis Christopher Liu edited comment on HBASE-11288 at 7/15/20, 3:39 AM: --- I am all for “collaborate and work out a path forward”. How do we go about doing that? My previous proposal was trying to add some rigor by tackling the most pressing issue (in terms of picking one over the other) which seemed based on earlier discussions and the documentation was tiering. Any proposal? I have some technical responses to the recents posts here but will try to hold off until we can agree on a path forward. (78 words) was (Author: toffer): I am all for “collaborate and work out a path forward”. How do we go about doing that? My previous proposal was tackling the most pressing issue which seemed based on earlier discussions and the documentation was tiering. Any proposal? I have some technical responses to the recents posts here but will try to hold off until we can agree on a path forward. (64 words) > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156783#comment-17156783 ] Michael Stack edited comment on HBASE-11288 at 7/13/20, 3:27 PM: - In this issue, I want a split meta – not fixes for hotspotting nor to develop a cache for the 'meta' table. There are alternative approaches splitting meta – one that has a version in production at scale on hbase1 with multiple attempts at landing the large patch updated for hbase2 and another that is being made up on the fly via PRs making use of some nice new features that have emerged of late. Given alternatives, I want us to collaborate and work out a path forward; an agreed upon design with our decisions (PRs w/o design ends up in a pissed-off OA annoyed at dumb questions trying to elicit architecture) (112 words not including these). was (Author: stack): In this issue, I want a split meta – not fixes for hotspotting nor to develop a cache for the 'meta' table. There are alternative approaches splitting meta – one that has a version in production at scale on hbase1 with multiple attempts at landing the large patch updated for hbase2 and another that is being made up on the fly via PRs making use of some nice new features that have emerged of late. Given alternatives, I want us to collaborate and work out a path forward; an agreed upon design with our decisions. (95 words not including these). > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154253#comment-17154253 ] Duo Zhang edited comment on HBASE-11288 at 7/9/20, 7:28 AM: {quote} Let's follow Stack's previous comment? Let's not discuss replicas or caching as they could be applied to either split meta implementation. Let's focus on the tiering issue? {quote} I do not know how can we make a cache server for a general root table. Please give a clear design on how to do it. {quote} I'm missing something. It is a POC and its intent was to gather information and help answer critical questions like this. So I'm not sure what the concern is against POCs doing what POCs are supposed to be used for? Rest assured for this particular case I am more concerned as to how we came to the decision than the actual decision. It would be best for everyone if we could apply some rigor for something so important. {quote} It is you that suggestted we run ITBLL against the POC, not me. Correct me if I'm wrong. {quote} It is definitely good for that. But what prevents us from using it for this purpose? Wether it always passes, fails sometimes or fail always we learn something and that is valuable for determining wether proc v2 can support 2-tier now, short term, long term or never. Then we can come to an informed decision. For example we might decide we cannot wait that long for proc v2 to mature so we go with 1-tier. {quote} I have shown my opinion above. I do not think this will help. But you're free to run ITBLL by yourself and post the result here, no one can stop you doing this right? {quote} It does help but it is still making a compromise to avoid the concerns of 2-tier. Which is why my main concern now is applying some rigor to help us come to a well informed decision as to wether proc v2 cannot support it and we should go with 1-tier. I think proc v2 is what was missing that prevented us from succeeding in the past although it’s possible it may not be mature enough at this stage. {quote} I'm a bit confused. Why do you think if proc-v2 can support 2-tier than we should use 2-tier? Please focus on the problem of the 'master local region' itself? Or your point is that 'master local region' does not use 2-tier so it is not a good solution? This does not make sense to me. Thanks. was (Author: apache9): {quote} Let's follow Stack's previous comment? Let's not discuss replicas or caching as they could be applied to either split meta implementation. Let's focus on the tiering issue? {quote} I do not know how can we make a cache server for a general root table. Please give a clear design on how to do it. {quote} I'm missing something. It is a POC and its intent was to gather information and help answer critical questions like this. So I'm not sure what the concern is against POCs doing what POCs are supposed to be used for? Rest assured for this particular case I am more concerned as to how we came to the decision than the actual decision. It would be best for everyone if we could apply some rigor for something so important. {quote} It is you that suggestted we run ITBLL against the POC, not me. Correct me if I'm wrong. {quote} It is definitely good for that. But what prevents us from using it for this purpose? Wether it always passes, fails sometimes or fail always we learn something and that is valuable for determining wether proc v2 can support 2-tier now, short term, long term or never. Then we can come to an informed decision. For example we might decide we cannot wait that long for proc v2 to mature so we go with 1-tier. {quote} I have shown my opinion above. I do not think this will help. But you're free to run ITBLL by yourself and post the result here, no one can stop you doing this right? {quote} It does help but it is still making a compromise to avoid the concerns of 2-tier. Which is why my main concern now is applying some rigor to help us come to a well informed decision as to wether proc v2 cannot support it and we should go with 1-tier. I think proc v2 is what was missing that prevented us from succeeding in the past although it’s possible it may not be mature enough at this stage. {quote} I'm a bit concerned. Why do you think if proc-v2 can support 2-tier than we should use 2-tier? Please focus on the problem of the 'master local region' itself? Or your point is that 'master local region' does not use 2-tier so it is not a good solution? This does not make sense to me. Thanks. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118515#comment-17118515 ] Francis Christopher Liu edited comment on HBASE-11288 at 5/28/20, 10:01 AM: [~zhangduo][~stack] since you guys had big concerns with tiered assignment, would appreciate some comments in anything I might be doing undesirable in the PR as well as suggestions if you have any. Note the PR is a draft so there is likely room for improvement. Past that would a long run of ChaosMonkey be a way to validate that it's stable? was (Author: toffer): [~zhangduo][~stack] since you guys had big concerns with tiered assignment, would appreciate some comments in anything I might be doing undesirable in the PR as well as suggestions if you have any. Past that would a long run of ChaosMonkey be a way to validate that it's stable? > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118504#comment-17118504 ] Francis Christopher Liu edited comment on HBASE-11288 at 5/28/20, 9:54 AM: --- {quote} I will try to implement splittable meta by storing the root table in a 'master local region', on the feature branch. {quote} I see is the intent to do a POC with master local region? {quote} HBASE24389, the goal is to move the location of meta to the 'master local region', and also remove the assumption that there is only a single meta region as much as possible in code, without actually split the meta table. {quote} I see, from my experience having written this patch a few times. Adding root and then generalizing hbase:meta (eg moving away from single meta region) regions is most of the code changes, the actual splitting meta code is not a lot since it's basically just another region. As well as the true way to test if you've done your job is to actually have more than one meta so it doesn't really make sense to split generalizing meta handling and actually splitting the table there might actually be more work if you do it that way. was (Author: toffer): {quote} I will try to implement splittable meta by storing the root table in a 'master local region', on the feature branch. {quote} I see is the intent to do a POC with master local region? {quote} HBASE24389, the goal is to move the location of meta to the 'master local region', and also remove the assumption that there is only a single meta region as much as possible in code, without actually split the meta table. {quote} I see, from my experience having written this patch a few times. Adding root and then generalizing hbase:meta (eg moving away from single meta region) regions is most of the code changes as well as the true way to test if you've done your job is to actually have more than one meta so it doesn't really make sense to split generalizing meta handling and actually splitting the table there might actually be more work. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118032#comment-17118032 ] Michael Stack edited comment on HBASE-11288 at 5/27/20, 8:10 PM: - [I started a one-pager design doc|https://docs.google.com/document/d/1OYrCfpmmLPkSa5-AepQw8hv09zNcNPaFzSXezhcMS7E/edit#]. Pile on. Has some Requirements and Questions only currently. was (Author: stack): I started a one-pager design doc. Pile on. Has some Requirements and Questions only currently. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117616#comment-17117616 ] Francis Christopher Liu edited comment on HBASE-11288 at 5/27/20, 10:22 AM: Hi guys, just following up on what everyone's thoughts is for where meta region locations will actually be stored? Are we already sold on one direction over another? Do we need more discussion? Do we need investigation to figure out which is better (eg POC)? So far it's between storing it in the master and the root table. Correct me if I'm wrong or oversimplifying [~zhangduo] but it seems the main contention/concern seems to be between the complexity that tiering assignments add and putting the responsibility on the master? Also I see a meta location store for master in another feature branch (HBASE-11288.splittable-meta)? Is this a POC or is this the direction we are going? Let me know as I've rebased the root patch and been working on adding split meta support to HBASE-11288 feature branch. was (Author: toffer): Hi guys, just following up on what everyone's thoughts is for where meta region locations will actually be stored? Are we already sold on one direction over another? Do we need more discussion? Do we need investigation to figure out which is better (eg POC)? So far it's between storing it in the master and the root table. Correct me if I'm wrong or oversimplifying [~zhangduo] but it seems the main contention/concern seems to be between the complexity that tiering assignments add and putting the responsibility on the master? Also I see a meta location store for master in another feature branch (HBASE-11288.splittable-meta)? Is this a POC or is this the direction we are going? Let me know as I've rebased the root patch and working on adding split meta support to HBASE-11288 feature branch. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113080#comment-17113080 ] Francis Christopher Liu edited comment on HBASE-11288 at 5/21/20, 11:23 AM: [~apurtell] thanks for the explanation, I'just have a few follow-up questions if thats ok. {quote} In contrast, what if we start exporting a subset of regionservers. How does that work? It's guaranteed to be different from how zk quorum strings worked. {quote} I pretty sure it's well thought out just understanding the thought process in case we need to consider creating a regionserver registry so we understand the tradeoff. My thinking was providing a static subset of regionservers the same way that's being done with providing a static list of masters. Are we not able to provide a static subset of masters in the current implementation? {quote} There are a lot more of them, they are randomly chosen , the connection info has to dynamic instead of static (how is that collected? published to clients?). {quote} I think there are a few ways to picking the static subset. Let me know if these make sense. One way would be to pick regionservers that are part of the regionserver group hosting the meta. Another way is to provide a seed list and an api to allow the client to discover more if they need to someone mentioned that his is how cassandra does it a while back when we were takling about a regionserver registry. Another way is to provide just a single domain name and that domain name contains a static subset list of regionserver ips which the operator can choose to updated dynamically in case one is decomissioned or a new on is added (this should work for the master approach too tho you would need a resolver) this way the client config does not need to change. was (Author: toffer): [~apurtell] thanks for the explanation, I'just have a few follow-up questions if thats ok. {quote} In contrast, what if we start exporting a subset of regionservers. How does that work? It's guaranteed to be different from how zk quorum strings worked. {quote} I pretty sure it's well thought out just understanding the thought process in case we need to consider creating a regionserver registry so we understand the tradeoff. My thinking was providing a static subset of regionservers the same way that's being done with providing a static list of masters. Are we not able to provide a static subset of masters in the current implementation? {quote} There are a lot more of them, they are randomly chosen , the connection info has to dynamic instead of static (how is that collected? published to clients?). {quote} I think there are many ways to picking the static subset. One way would be to pick regionservers that are part of the regionserver group hosting the meta. Another way is to provide a seed list and an api to allow the client to discover more if they need to someone mentioned that his is how cassandra does it a while back when we were takling about a regionserver registry. Another way is to provide just a single domain name and that domain name contains a static subset list of regionserver ips which the operator can choose to updated dynamically in case one is decomissioned or a new on is added (this should work for the master approach too tho you would need a resolver) this way the client config does not need to change. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113085#comment-17113085 ] Francis Christopher Liu edited comment on HBASE-11288 at 5/21/20, 11:11 AM: {quote} At least we have to provide an API to fecth all the content in root table, as the cache service needs it. So for HBCK it could also uses this API. But this API will not be part of the client facing API, for client they just need a simple locateMeta method. {quote} I see. You would need mutation apis as well to correct the errors. was (Author: toffer): {quote} At least we have to provide an API to fecth all the content in root table, as the cache service needs it. So for HBCK it could also uses this API. But this API will not be part of the client facing API, for client they just need a simple locateMeta method. {quote} I see. You would need mutation apis as well. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112530#comment-17112530 ] Andrew Kyle Purtell edited comment on HBASE-11288 at 5/20/20, 7:19 PM: --- bq. I agree publishing all the regionserver locations wouldn't make sense, but I'm curious and probably missing something as to why can't we publish a subset of regionserver locations? Because we also want migrating service discovery and configuration considerations to closely align, to avoid having to rethink how clients discover cluster service. Right now operators have to pass around a string of zookeeper endpoints. As you know zookeeper quorums are limited in size by how they scale writes. And, in many deployments, ZK and masters might be colocated, because they are both _meta_ services (metadata and coordination), and their deployments have the same shape both in terms of number of instances and placement with respect to failure domains. Sum these considerations, it is highly advantageous if you are managing a string of ZK endpoints to bootstrap clients today, then tomorrow you have the option of simply replacing the string of zk service locations with a string of master service locations. Same order of scale. Probably, same failure domain engineering. Possibly, colocation. It might be the same list of hostnames served to clients the same way the zk quorum string was previously served, with only the port numbers changing. In contrast, what if we start exporting a subset of regionservers. How does that work? It's guaranteed to be different from how zk quorum strings worked. There are a lot more of them, they are randomly chosen (?), the connection info has to dynamic instead of static (how is that collected? published to clients?). I could go on. The alternative, described above, has been well thought out, IMHO. was (Author: apurtell): bq. I agree publishing all the regionserver locations wouldn't make sense, but I'm curious and probably missing something as to why can't we publish a subset of regionserver locations? Because we also want migrating service discovery and configuration considerations to closely align, to avoid having to rethink how clients discover cluster service. Right now operators have to pass around a string of zookeeper endpoints. As you know zookeeper quorums are limited in size by how they scale writes. And, in many deployments, ZK and masters might be colocated, because they are both _meta_ services (metadata and coordination), and their deployments have the same shape both in terms of number of instances and placement with respect to failure domains. Sum these considerations, it is highly advantageous if you have a string of ZK endpoints today, and tomorrow you simply replace this string with a string of master endpoints. Probably its the same list of hostnames served to clients the same way the zk quorum string was previously served, with only the port numbers changing. In contrast, what if we start exporting a subset of regionservers. How does that work? It's guaranteed to be different from how zk quorum strings worked. There are a lot more of them, they are randomly chosen (?), the connection info has to dynamic instead of static (how is that collected? published to clients?). I could go on. The alternative, described above, has been well thought out, IMHO. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112530#comment-17112530 ] Andrew Kyle Purtell edited comment on HBASE-11288 at 5/20/20, 7:17 PM: --- bq. I agree publishing all the regionserver locations wouldn't make sense, but I'm curious and probably missing something as to why can't we publish a subset of regionserver locations? Because we also want migrating service discovery and configuration considerations to closely align, to avoid having to rethink how clients discover cluster service. Right now operators have to pass around a string of zookeeper endpoints. As you know zookeeper quorums are limited in size by how they scale writes. And, in many deployments, ZK and masters might be colocated, because they are both _meta_ services (metadata and coordination), and their deployments have the same shape both in terms of number of instances and placement with respect to failure domains. Sum these considerations, it is highly advantageous if you have a string of ZK endpoints today, and tomorrow you simply replace this string with a string of master endpoints. Probably its the same list of hostnames served to clients the same way the zk quorum string was previously served, with only the port numbers changing. In contrast, what if we start exporting a subset of regionservers. How does that work? It's guaranteed to be different from how zk quorum strings worked. There are a lot more of them, they are randomly chosen (?), the connection info has to dynamic instead of static (how is that collected? published to clients?). I could go on. The alternative, described above, has been well thought out, IMHO. was (Author: apurtell): bq. I agree publishing all the regionserver locations wouldn't make sense, but I'm curious and probably missing something as to why can't we publish a subset of regionserver locations? Because we also want migrating service discovery and configuration considerations to closely align, to avoid having to rethink how clients discover cluster service. Right now operators have to pass around a string of zookeeper endpoints. As you know zookeeper quorums are limited in size by how they scale writes. And, in many deployments, ZK and masters might be colocated, because they are both metadata services, and their deployments have the same shape both in terms of number of instances and placement with respect to failure domains. Sum these considerations, it is highly advantageous if you have a string of ZK endpoints today, and tomorrow you simply replace this string with a string of master endpoints. Probably its the same list of hostnames served to clients the same way the zk quorum string was previously served, with only the port numbers changing. In contrast, what if we start exporting a subset of regionservers. How does that work? It's guaranteed to be different from how zk quorum strings worked. There are a lot more of them, they are randomly chosen (?), the connection info has to dynamic instead of static (how is that collected? published to clients?). I could go on. The alternative, described above, has been well thought out, IMHO. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112539#comment-17112539 ] Andrew Kyle Purtell edited comment on HBASE-11288 at 5/20/20, 7:03 PM: --- bq. But anyway, the ConnectionRegsitry is pluggable, so for users who can not control things other than HBase, they could use the new registry implementation in HBASE-18095 to reduce the load of zookeeper, [~zhangduo] If we are going to discuss this in this level of detail, I feel I have to enumerate why we built it, which doesn't have anything to do with load per se, just to clarify: - For configuring for fail fast, having to think about zk connection configuration particulars in addition to HBase RPC configuration is doable, but clumsy, and limiting, and not always done correctly. It matters when you operate at scale and have a number of different internal customers with different expectations about retry or fail-fast behavior. A monolithic deploy / service organization may not have this, which is fine, it's optional. - It's a security problem that zk is exposed to clients. ZK's security model is problematic. (3.5 and up can be better with TLS, requiring successful client and server cert-based auth before accepting any requests, but we don't support the ZK TLS transport out of the box actually.) For operators with this concern, now they can isolate the ZK service from end users with network or host ACLs, and HBase can service those clients still. was (Author: apurtell): bq. But anyway, the ConnectionRegsitry is pluggable, so for users who can not control things other than HBase, they could use the new registry implementation in HBASE-18095 to reduce the load of zookeeper, [~zhangduo] We are going to discuss this in this level of detail, I feel I have to enumerate why we built it, which doesn't have anything to do with load per se, just to clarify: - For configuring for fail fast, having to think about zk connection configuration particulars in addition to HBase RPC configuration is doable, but clumsy, and limiting, and not always done correctly. It matters when you operate at scale and have a number of different internal customers with different expectations about retry or fail-fast behavior. A monolithic deploy / service organization may not have this, which is fine, it's optional. - It's a security problem that zk is exposed to clients. ZK's security model is problematic. (3.5 and up can be better with TLS, requiring successful client and server cert-based auth before accepting any requests, but we don't support the ZK TLS transport out of the box actually.) For operators with this concern, now they can isolate the ZK service from end users with network or host ACLs, and HBase can service those clients still. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112539#comment-17112539 ] Andrew Kyle Purtell edited comment on HBASE-11288 at 5/20/20, 7:03 PM: --- bq. But anyway, the ConnectionRegsitry is pluggable, so for users who can not control things other than HBase, they could use the new registry implementation in HBASE-18095 to reduce the load of zookeeper, [~zhangduo] If we are going to discuss this in this level of detail, I feel I have to enumerate why we built it, which doesn't have anything to do with load per se, just to clarify: - For configuring for fail fast, having to think about zk connection configuration particulars in addition to HBase RPC configuration is doable, but clumsy, and limiting, and not always done correctly. It matters when you operate at scale and have a number of different internal customers with different expectations about retry or fail-fast behavior. A monolithic deploy / service organization may not have this concern, which is fine, it's optional. - It's a security problem that zk is exposed to clients. ZK's security model is problematic. (3.5 and up can be better with TLS, requiring successful client and server cert-based auth before accepting any requests, but we don't support the ZK TLS transport out of the box actually.) For operators with this concern, now they can isolate the ZK service from end users with network or host ACLs, and HBase can service those clients still. was (Author: apurtell): bq. But anyway, the ConnectionRegsitry is pluggable, so for users who can not control things other than HBase, they could use the new registry implementation in HBASE-18095 to reduce the load of zookeeper, [~zhangduo] If we are going to discuss this in this level of detail, I feel I have to enumerate why we built it, which doesn't have anything to do with load per se, just to clarify: - For configuring for fail fast, having to think about zk connection configuration particulars in addition to HBase RPC configuration is doable, but clumsy, and limiting, and not always done correctly. It matters when you operate at scale and have a number of different internal customers with different expectations about retry or fail-fast behavior. A monolithic deploy / service organization may not have this, which is fine, it's optional. - It's a security problem that zk is exposed to clients. ZK's security model is problematic. (3.5 and up can be better with TLS, requiring successful client and server cert-based auth before accepting any requests, but we don't support the ZK TLS transport out of the box actually.) For operators with this concern, now they can isolate the ZK service from end users with network or host ACLs, and HBase can service those clients still. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112035#comment-17112035 ] Francis Christopher Liu edited comment on HBASE-11288 at 5/20/20, 10:33 AM: {quote} The solution is to make meta splittable... But this is not possible for root table right? {quote} Right but there's being able to scale and there's getting hammered/spammed. For the latter you could still run into issues and still need a solution. {quote} Well, for me I would say they are not the same... The location for root is on zk while meta is in root, and root can not be split but meta can... {quote} True but they are both tables. Operationally that makes things more intuitive. Eg load balancing across servers etc. was (Author: toffer): {quote} The solution is to make meta splittable... But this is not possible for root table right? {quote} There's being able to scale and there's getting hammered/spammed. For the latter you could still run into issues and still need a solution. {quote} Well, for me I would say they are not the same... The location for root is on zk while meta is in root, and root can not be split but meta can... {quote} True but they are both tables. Operationally that makes things more intuitive. Eg load balancing across servers etc. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112028#comment-17112028 ] Francis Christopher Liu edited comment on HBASE-11288 at 5/20/20, 10:25 AM: {quote} I think we should figure out how this will work. Can be done in parallel in another issue. Baseline is old clients keep working. {quote} Yep let's make backward compatibility work. The simplest would be to just proxy requests like rest or thrift server does now. If a request for meta hits the regionserver hosting root. Should be fairly straightforward. Yeah another branch on top of this one maybe just to not muck things up as we are trying stuff. {quote} The tiering of assignment – root going out before anything else, then all of the meta regions and then user-space regions. And that ROOT is just so ugly... the way the rows are done; the crazy comparator. {quote} Tiering - yeah this is a tough one since root holds meta locationsit invariable requires root to be available for writing. What is your concern here...the complexity of the process? Ugly root - The way the row keys are done? The nested thing? It's not that bad at least the logic is being reused. Tho we probably don't need the "hbase:meta," prefix on all rowstho I"m not sure what we gain? {quote} Currently we have meta and namespace and acls and we still don't get it right (Duo helped us out here by merging namespace into meta table in master branch). {quote} ACLs I've never had an issue with? Namespace AFAIK needed attention and got kicked around a bit? It's better it's in hbase:meta now...no one will ignore that table as well as less specialized code. This is a bit different from tiering assignment issue tho? {quote} We should do up a bit of a doc on how this will work; how it will work for old and new clients (e.g. handing out the whole meta table when client does meta location lookup); how the proxying will be done so old clients will work when meta splits, etc. {quote} Sounds good. Do you want to brainstorm? Or should I write a proposal then we work on it? was (Author: toffer): {quote} I think we should figure out how this will work. Can be done in parallel in another issue. Baseline is old clients keep working. {quote} Yep let's make backward compatibility work. The simplest would be to just proxy requests like rest or thrift server does now. If a request for meta hits the regionserver hosting root. Should be fairly straightforward. Yeah another branch on top of this one maybe just to not muck things up as we are trying stuff. {quote} The tiering of assignment – root going out before anything else, then all of the meta regions and then user-space regions. And that ROOT is just so ugly... the way the rows are done; the crazy comparator. {quote} Tiering - yeah this is a tough one since root holds meta locationsit invariable requires root to be available for writing. What is your concern here...the complexity of the process? Ugly root - The way the row keys are done? The nested thing? It's not that bad at least the logic is being reused. Tho we probably don't need the "hbase:meta," prefix on all rowstho I"m not sure what we gain? {quote} Currently we have meta and namespace and acls and we still don't get it right (Duo helped us out here by merging namespace into meta table in master branch). {quote} ACLs I've never had an issue with? Namespace AFAIK needed attention and got kicked around a bit? It's better it's in hbase:meta now...no one will ignore that table. This is a bit different from tiering assignment issue tho? {quote} We should do up a bit of a doc on how this will work; how it will work for old and new clients (e.g. handing out the whole meta table when client does meta location lookup); how the proxying will be done so old clients will work when meta splits, etc. {quote} Sounds good. Do you want to brainstorm? Or should I write a proposal then we work on it? > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111992#comment-17111992 ] Francis Christopher Liu edited comment on HBASE-11288 at 5/20/20, 9:57 AM: --- {quote} Just do not see any difference if we just have a single root region on a region server? For all the cases, the client will also harmmering the region server which hosts the root region and bring down all the cluster... {quote} IMHO this type of workload is a regionserver responsibility and hence there are advantages to it being done on the regionserver. 1. AFAIK hbase:meta today is getting hammered? If so, we'll likely want to fix that if we haven't already? If so then whatever fix/enhancement we did the root table can take advantage of as well? In which case it gives simplicity and code reuse as we are applying one solution to two problems. Although we'd have to add code for root getting assigned first etc, which I believe should be relatively straightforward with procedures. On the other hand if use the master and backup masters we will be giving them a new specialized responsibility, which will introduce new specialized code. And then we would need to introduce more code for the master(s) to not get hammered. 2. From an operations perspective it would be easier for operations folks to reason about things as the way both catalog tiers are handled are the same and so the way to manage them if there are issues are the same. 3. In recovery scenarios the master is already busy doing it's master work, so it would be better if it could focus it's resources on that as it does not horizontally scale. Please correct me if I'm making any wrong assumptions here. Let me know what you think? was (Author: toffer): {quote} Just do not see any difference if we just have a single root region on a region server? For all the cases, the client will also harmmering the region server which hosts the root region and bring down all the cluster... {quote} IMHO this type of workload is a regionserver responsibility and hence there are advantages to it being done on the regionserver. 1. AFAIK hbase:meta today is getting hammered? If so, we'll likely want to fix that if we haven't already? If so then whatever fix/enhancement we did the root table can take advantage of as well? In which case it gives simplicity and code reuse as we are applying one solution to two problems. Although we'd have to add code for root getting assigned first etc, which I believe should be relatively straightforward with procedures. On the other hand if use the master and backup masters we will be giving them a new specialized responsibility, which will introduce new specialized code. And then we would need to introduce more code for the master(s) to not get hammered. 2. From an operations perspective it would be easier for operations folks to reason about things as the way both catalog tiers are handled are the same and so the way to manage them if there are issues are the same. Please correct me if I'm making any wrong assumptions here. Let me know what you think? > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111060#comment-17111060 ] Francis Christopher Liu edited comment on HBASE-11288 at 5/19/20, 10:42 AM: {quote} We store the procedure data in a local HRegion on master. {quote} I see thanks for the explanation and the link. {quote} And on client accessing meta, I think for root region it is fine. We do not mirror all the requests to region servers to master, when will only contact master when locating meta, which should be rarely happen. {quote} I can think of a couple of cases were it might not be good/ideal: 1. Large batch jobs that does lookups on an hbase table in tasks would hit it on startup. Same for large application restarts (eg storm topologies, etc). 2. In the event of failure degradation is not as graceful, as if clients are not able to find meta regions they will start hammering the master when the master is already busy trying to recover the cluster. 3. Generally just misbehaving actors (eg poorly written applications) or buggy third party client implementations that users attempt to use. {quote} And now we even have a MasterRegistry on 2.x, where we move the locating meta requests from zk to master. {quote} I think what we might do is add a resolver plugin so all master registry requests just go to the backup masters. I've not looked at the code yet but maybe we can have the backup masters serve out the local hregion content from the backup masters? IMHO ideally the registry should've been served out of the regionservers. Today we have a "system" regionserver group where all the system tables served out of (including root). With the new setup we would have a system group and an extra set of master servers for serving out root and registry stuff and the extra operational nuance of backup masters not just being backup but doing active work for the system. Just laying out my concerns let me know what you guys think? Let me know if it's reasonable/unreasonable. was (Author: toffer): {quote} We store the procedure data in a local HRegion on master. {quote} I see thanks for the explanation and the link. {quote} And on client accessing meta, I think for root region it is fine. We do not mirror all the requests to region servers to master, when will only contact master when locating meta, which should be rarely happen. {quote} I can think of a couple of cases were it might not be good/ideal: 1. Large batch jobs that does lookups on an hbase table in tasks would hit it on startup. Same for large application restarts (eg storm topologies, etc). 2. In the event of failure degradation is not as graceful, as if clients are not able to find meta regions they will start hammering the master when the master is already busy trying to recover the cluster. 3. Generally just misbehaving actors (eg poorly written applications) or buggy third party client implementations that users attempt to use. {quote} And now we even have a MasterRegistry on 2.x, where we move the locating meta requests from zk to master. {quote} I think what we might do is add a resolver plugin so all master registry requests just go to the backup masters. I've not looked at the code yet but maybe we can have the backup masters serve out the local hregion content from the backup masters? IMHO ideally the registry should've been served out of the regionservers. Today we have a "system" regionserver group where all the system tables served out of (including root). With the new setup we would have a system group and an extra set of master servers for serving out root and registry stuff and the extra nuance of backup masters not just being backup but doing active work for the system. Just laying out my concerns let me know what you guys think? > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110755#comment-17110755 ] Michael Stack edited comment on HBASE-11288 at 5/19/20, 1:20 AM: - Can we split meta in a way such that older clients keep working? Perhaps old clients get a proxying Region that fields requests to the actual split meta table so they keep working? Can we NOT do a ROOT table w/ its crazy meta-meta comparator? Yeah, when registry is hosted by Master, master now carries some location ops we used farm out to zk. Master-hosted registry could host Meta table Region locations too... seems apt for a Registry. We've not been good at tier's of assign; i.e. ROOT first, then META regions. was (Author: stack): Can we split meta in a way such that older clients keep working? Perhaps old clients get a proxying Region that fields requests to the actual split meta table so they keep working? Can we NOT do a ROOT table w/ its crazy meta-meta comparator? Yeah, when registry is hosted by Master, master now carries some location ops we used farm out to zk. Master-hosted registry could host Meta table Region locations too... seems apt for a Registry. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Umbrella > Components: meta >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073594#comment-17073594 ] Francis Christopher Liu edited comment on HBASE-11288 at 4/2/20, 10:35 AM: --- Hi, I pushed up my current WIP/Draft patch for feedback as PR on [github|https://github.com/apache/hbase/pull/1418]. Please let me know what you guys think. was (Author: toffer): Hi, I pushed up my current WIP/Draft patch for feedback as PR on [github|https://github.com/apache/hbase/pull/1418]. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Sub-task >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942317#comment-16942317 ] Francis Christopher Liu edited comment on HBASE-11288 at 10/1/19 9:11 PM: -- {quote} I'd like to see old clients keep working though meta has split under them. Perhaps the serverName in the MetaRegionServer znode proxies requests when the region asked for is named 'hbase:meta,,1.1588230740' to actual meta table whose regions could be elsewhere on the cluster. I suppose 'hbase:meta,,1.1588230740' would have to be the old ROOT too? {quote} Oh interesting idea. How about instead of proxying when a requests for meta hits the RS with the ROOT region, we could throw a RegionMovedException to redirect the client to the proper meta region RS . On the RS holding the meta region we translate the 'hbase:meta,,1.1588230740' region info to the appropriate meta region info if split. This way we can avoid proxying which seems to make things a bit more complicated? {quote} Lets do a one-pager. {quote} Sounds good. was (Author: toffer): {quote} I'd like to see old clients keep working though meta has split under them. Perhaps the serverName in the MetaRegionServer znode proxies requests when the region asked for is named 'hbase:meta,,1.1588230740' to actual meta table whose regions could be elsewhere on the cluster. I suppose 'hbase:meta,,1.1588230740' would have to be the old ROOT too? {quote} Oh interesting idea. How about instead of proxying we could throw a RegionMovedException to redirect to the proper meta region RS when a requests for meta hits the RS with the ROOT region. On the RS holding the meta region we translate the 'hbase:meta,,1.1588230740' region info to the appropriate meta region info if split. This way we can avoid proxying which seems to make things a bit more complicated? {quote} Lets do a one-pager. {quote} Sounds good. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Sub-task >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942317#comment-16942317 ] Francis Christopher Liu edited comment on HBASE-11288 at 10/1/19 9:10 PM: -- {quote} I'd like to see old clients keep working though meta has split under them. Perhaps the serverName in the MetaRegionServer znode proxies requests when the region asked for is named 'hbase:meta,,1.1588230740' to actual meta table whose regions could be elsewhere on the cluster. I suppose 'hbase:meta,,1.1588230740' would have to be the old ROOT too? {quote} Oh interesting idea. How about instead of proxying we could throw a RegionMovedException to redirect to the proper meta region RS when a requests for meta hits the RS with the ROOT region. On the RS holding the meta region we translate the 'hbase:meta,,1.1588230740' region info to the appropriate meta region info if split. This way we can avoid proxying which seems to make things a bit more complicated? {quote} Lets do a one-pager. {quote} Sounds good. was (Author: toffer): {quote} I'd like to see old clients keep working though meta has split under them. Perhaps the serverName in the MetaRegionServer znode proxies requests when the region asked for is named 'hbase:meta,,1.1588230740' to actual meta table whose regions could be elsewhere on the cluster. I suppose 'hbase:meta,,1.1588230740' would have to be the old ROOT too? {quote} Oh interesting idea. How about instead of proxying we could throw a RegionMovedException to the proper meta region destination when a requests for meta hits the RS with the ROOT region. On the RS holding the meta region we translate the 'hbase:meta,,1.1588230740' region info to the appropriate meta region info if split. This way we can avoid proxying which seems to make things a bit more complicated? {quote} Lets do a one-pager. {quote} Sounds good. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Sub-task >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941446#comment-16941446 ] Francis Christopher Liu edited comment on HBASE-11288 at 10/1/19 1:21 AM: -- {quote} How many rows in your biggest hbase:root? And how many regions in hbase:meta have you gotten to? {quote} We have a cluster that has 1.04 million regions. That same cluster has 119 hbase:meta regions. We configured the split size to be small. {quote} A lot has changed since 1.3 but patch will help. {quote} Yes, it shows two important things: 1. The approach of adding a splittable hbase:meta is fairly straightforward and can be applied to other branches (even tho a lot has changed), 2. The approach works :-). Next question is where do we go from here? A patch for master? was (Author: toffer): {quote} How many rows in your biggest hbase:root? And how many regions in hbase:meta have you gotten to? {quote} We have a cluster that has 1.04 million regions. That same cluster has 119 hbase:meta regions. We configured the split size to be small. {quote} A lot has changed since 1.3 but patch will help. {quote} Yes, it shows two important things: 1. The approach of adding hbase:meta is fairly straightforward and can be applied to other branches (even tho a lot has changed), 2. The approach works :-). Next question is where do we go from here? A patch for master? > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Sub-task >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940663#comment-16940663 ] Francis Christopher Liu edited comment on HBASE-11288 at 9/30/19 6:06 AM: -- Apologies for the delay here's a short [writeup|https://docs.google.com/document/d/1cs_sRC5xbK2JPdw99kqUqUGMJ0yHier5QPx21al619g/edit]. Let me know if it needs more detail or feel free to comment. BTW I had to rebase the branch. was (Author: toffer): Apologies for the delay here's a short [writeup|[https://docs.google.com/document/d/1cs_sRC5xbK2JPdw99kqUqUGMJ0yHier5QPx21al619g/edit]] . Let me know if it needs more detail or feel free to comment. BTW I had to rebase the branch. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Sub-task >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940663#comment-16940663 ] Francis Christopher Liu edited comment on HBASE-11288 at 9/30/19 6:04 AM: -- Apologies for the delay here's a short [writeup|[https://docs.google.com/document/d/1cs_sRC5xbK2JPdw99kqUqUGMJ0yHier5QPx21al619g/edit]] . Let me know if it needs more detail or feel free to comment. BTW I had to rebase the branch. was (Author: toffer): Apologies for the delay here's a short [writeup|[https://docs.google.com/document/d/1cs_sRC5xbK2JPdw99kqUqUGMJ0yHier5QPx21al619g/edit]]. Let me know if it needs more detail or feel free to comment. BTW I had to rebase the branch. > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Sub-task >Reporter: Francis Christopher Liu >Assignee: Francis Christopher Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-11288) Splittable Meta
[ https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936585#comment-16936585 ] Francis Liu edited comment on HBASE-11288 at 9/24/19 9:16 AM: -- Here's a WIP branch: [https://github.com/francisliu/hbase/tree/apache_1.3_splitmeta] I'm working on stripping down most of the changes to only what's needed to split meta. The rest can come as follow-ons if needed. At it's core there's actually not much change, the patch is partly big because there's a lot of renames which are touching a lot of files. So with this branch we can get a very good picture of the changes needed. I'm still running the unit tests. Will move this to a feature branch in the hbase repo as soon as most of the tests are passing. I'll add more info tomorrow. Next question is what we want to do. Should I get everything we want working on the 1.3 branch or should I port this to trunk or 1.x or? was (Author: toffer): Here's a WIP branch: [https://github.com/francisliu/hbase/tree/apache_1.3_splitmeta] I'm working on stripping down most of the changes to only what's needed to split meta. The rest can come as follow-ons if needed. At it's core there's actually not much change, the patch is partly big because there's a lot of renames which are touching a lot of files. So with this branch we can get a very good picture of the changes needed. I'm still running the unit tests. Will move this to a feature branch in the hbase repo as soon as most of the tests are passing. Next question is what we want to do. Should I get everything we want working on the 1.3 branch or should I port this to trunk or 1.x or? > Splittable Meta > --- > > Key: HBASE-11288 > URL: https://issues.apache.org/jira/browse/HBASE-11288 > Project: HBase > Issue Type: Sub-task >Reporter: Francis Liu >Assignee: Francis Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)