[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-12-04 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244405#comment-17244405
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 12/5/20, 5:38 AM:
---

Good idea [~stack]. +1 Let's leave the baggage here so it'll be easier for the 
community to get involved.


was (Author: toffer):
Good idea [~stack]. +1 Let's leave the baggage here so it'll be easier for the 
community to get invovled.

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
> Attachments: jstack20200807_bad_rpc_priority.txt, root_priority.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-08-10 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175227#comment-17175227
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 8/11/20, 4:59 AM:
---

Apologies for the late reply. I got caught up dealing with the issues to get my 
setup running. Right now it's running and 4 of 5 iterations has finished. I'm 
using the command [~stack] shared.

Here's a rundown of some of the things I dealt with so far.

1. Disabled netty native transport - regionservers JVM's would basically 
freeze. Couldn't even do jstack without "-F". I found out that internally we 
don't run with native transport for hadoop as it has a lot of issues apparently 
(wrong fds getting closed, etc).
2. Disabled security - there was some issue with TokenProvider throwing 
IllegalThreadStateException on startup causing the RS to abort
3. Hadoop-2.8  and zookeeper 3.4.x - made some quick changes so I could 
downgrade the dependencies as these are what we use internally. Shouldn't 
affect the validity of the test?
4. MonitorTaskImpl.prettyPrintJournal was throwing an NPE causing RS to abort - 
might be related to me using "filesystem" as the wal provider. I changed the 
code to swallow the exception for now as it seems unrelated. Will try and rerun 
the test use Async wal provider.
5. SCP bug introduced by patch - caused meta not to get assigned. Caught and 
fixed this quick and early.
6. [~stack] root executor patch - didn't run into this, but makes sense so I've 
applied the patch to the branch and have been running with it.




was (Author: toffer):
Apologies for the late reply. I got caught up dealing with the issues to get my 
setup running. Right now it's running and 4 of 5 iterations has finished. I'm 
using the command [~stack] shared.

Here's a rundown of some of the things I dealt with so far.

1. Disabled netty native transport - regionservers JVM's would basically 
freeze. Couldn't even do jstack without "-F". I found out that internally we 
don't run with native transport for hadoop as it has a lot of issues apparently 
(wrong fds getting closed, etc).
2. Disabled security - there was some issue with TokenProvider throwing 
IllegalThreadStateException on startup causing the RS to abort
3. Hadoop-2.8  and zookeeper 3.4.x - made some quick changes so I could 
downgrade the dependencies as these are what we use internally. Shouldn't 
affect the validity of the test?
4. MonitorTaskImpl.prettyPrintJournal was throwing an NPE causing RS to abort - 
might be related to me using "filesystem" as the wal provider. I changed the 
code to swallow the exception for now as it seems unrelated. Will try and rerun 
the test use Async wal provider.
5. SCP bug introduced by patch - caused meta not to get assigned. Caught and 
fixed this quick and early.
6. [~stack] root executor patch - didn't run into this, but makes sense.



> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
> Attachments: jstack20200807_bad_rpc_priority.txt, root_priority.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-08-07 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173524#comment-17173524
 ] 

Michael Stack edited comment on HBASE-11288 at 8/8/20, 12:28 AM:
-

On latest ITBLL run, I ran into a deadlock, a combination of chaos monkey 
DDOS'ing the 'default' RPC handlers and then hbase:root failing to report 
successful open because its 'priority' was default rather than priority – it 
was locked out.

Will attach jstack and patch to make it so hbase:root ops get priority (the 
patch is a hack...needs polish... adds a root executor and upping priority if 
root table)

[^jstack20200807_bad_rpc_priority.txt]

^[^root_priority.patch]^

 


was (Author: stack):
On latest ITBLL run, I ran into a deadlock, a combination of chaos monkey 
DDOS'ing the 'default' RPC handlers and then hbase:root failing to report 
successful open because its 'priority' was default rather than priority – it 
was locked out.

Will attach jstack and patch to make it so hbase:root ops get priority (the 
patch is a hack...needs polish... adds a root executor and upping priority if 
root table)

[^jstack20200807_bad_rpc_priority.txt]

 

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
> Attachments: jstack20200807_bad_rpc_priority.txt, root_priority.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-08-07 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173524#comment-17173524
 ] 

Michael Stack edited comment on HBASE-11288 at 8/8/20, 12:27 AM:
-

On latest ITBLL run, I ran into a deadlock, a combination of chaos monkey 
DDOS'ing the 'default' RPC handlers and then hbase:root failing to report 
successful open because its 'priority' was default rather than priority – it 
was locked out.

Will attach jstack and patch to make it so hbase:root ops get priority (the 
patch is a hack...needs polish... adds a root executor and upping priority if 
root table)

[^jstack20200807_bad_rpc_priority.txt]

 


was (Author: stack):
On latest ITBLL run, I ran into a deadlock, a combination of chaos monkey 
DDOS'ing the 'default' RPC handlers and then hbase:root failing to report 
successful open because its 'priority' was default rather than priority – it 
was locked out.

Will attach jstack and patch to make it so hbase:root ops get priority.

 

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
> Attachments: jstack20200807_bad_rpc_priority.txt, root_priority.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-07-14 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157846#comment-17157846
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 7/15/20, 3:39 AM:
---

I am all for “collaborate and work out a path forward”. How do we go about 
doing that? My previous proposal was trying to add some rigor by tackling the 
most pressing issue (in terms of picking one over the other) which seemed based 
on earlier discussions and the documentation was tiering. Any proposal?

I have some technical responses to the recents posts here but will try to hold 
off until we can agree on a path forward.

(78 words)



was (Author: toffer):
I am all for “collaborate and work out a path forward”. How do we go about 
doing that? My previous proposal was tackling the most pressing issue which 
seemed based on earlier discussions and the documentation was tiering. Any 
proposal?

I have some technical responses to the recents posts here but will try to hold 
off until we can agree on a path forward.

(64 words)


> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-07-13 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17156783#comment-17156783
 ] 

Michael Stack edited comment on HBASE-11288 at 7/13/20, 3:27 PM:
-

In this issue, I want a split meta – not fixes for hotspotting nor to develop a 
cache for the 'meta' table. There are alternative approaches splitting meta – 
one that has a version in production at scale on hbase1 with multiple attempts 
at landing the large patch updated for hbase2 and another that is being made up 
on the fly via PRs making use of some nice new features that have emerged of 
late. Given alternatives, I want us to collaborate and work out a path forward; 
an agreed upon design with our decisions (PRs w/o design ends up in a 
pissed-off OA annoyed at dumb questions trying to elicit architecture) (112 
words not including these).


was (Author: stack):
In this issue, I want a split meta – not fixes for hotspotting nor to develop a 
cache for the 'meta' table. There are alternative approaches splitting meta – 
one that has a version in production at scale on hbase1 with multiple attempts 
at landing the large patch updated for hbase2 and another that is being made up 
on the fly via PRs making use of some nice new features that have emerged of 
late. Given alternatives, I want us to collaborate and work out a path forward; 
an agreed upon design with our decisions. (95 words not including these).

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-07-09 Thread Duo Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154253#comment-17154253
 ] 

Duo Zhang edited comment on HBASE-11288 at 7/9/20, 7:28 AM:


{quote}
Let's follow Stack's previous comment? Let's not discuss replicas or caching as 
they could be applied to either split meta implementation. Let's focus on the 
tiering issue?
{quote}

I do not know how can we make a cache server for a general root table. Please 
give a clear design on how to do it.

{quote}
I'm missing something. It is a POC and its intent was to gather information and 
help answer critical questions like this. So I'm not sure what the concern is 
against POCs doing what POCs are supposed to be used for? Rest assured for this 
particular case I am more concerned as to how we came to the decision than the 
actual decision. It would be best for everyone if we could apply some rigor for 
something so important.
{quote}
It is you that suggestted we run ITBLL against the POC, not me. Correct me if 
I'm wrong.

{quote}
It is definitely good for that. But what prevents us from using it for this 
purpose? Wether it always passes, fails sometimes or fail always we learn 
something and that is valuable for determining wether proc v2 can support 
2-tier now, short term, long term or never. Then we can come to an informed 
decision. For example we might decide we cannot wait that long for proc v2 to 
mature so we go with 1-tier.
{quote}
I have shown my opinion above. I do not think this will help. But you're free 
to run ITBLL by yourself and post the result here, no one can stop you doing 
this right?

{quote}
It does help but it is still making a compromise to avoid the concerns of 
2-tier. Which is why my main concern now is applying some rigor to help us come 
to a well informed decision as to wether proc v2 cannot support it and we 
should go with 1-tier. I think proc v2 is what was missing that prevented us 
from succeeding in the past although it’s possible it may not be mature enough 
at this stage.
{quote}

I'm a bit confused. Why do you think if proc-v2 can support 2-tier than we 
should use 2-tier? Please focus on the problem of the 'master local region' 
itself? Or your point is that 'master local region' does not use 2-tier so it 
is not a good solution? This does not make sense to me.

Thanks.


was (Author: apache9):
{quote}
Let's follow Stack's previous comment? Let's not discuss replicas or caching as 
they could be applied to either split meta implementation. Let's focus on the 
tiering issue?
{quote}

I do not know how can we make a cache server for a general root table. Please 
give a clear design on how to do it.

{quote}
I'm missing something. It is a POC and its intent was to gather information and 
help answer critical questions like this. So I'm not sure what the concern is 
against POCs doing what POCs are supposed to be used for? Rest assured for this 
particular case I am more concerned as to how we came to the decision than the 
actual decision. It would be best for everyone if we could apply some rigor for 
something so important.
{quote}
It is you that suggestted we run ITBLL against the POC, not me. Correct me if 
I'm wrong.

{quote}
It is definitely good for that. But what prevents us from using it for this 
purpose? Wether it always passes, fails sometimes or fail always we learn 
something and that is valuable for determining wether proc v2 can support 
2-tier now, short term, long term or never. Then we can come to an informed 
decision. For example we might decide we cannot wait that long for proc v2 to 
mature so we go with 1-tier.
{quote}
I have shown my opinion above. I do not think this will help. But you're free 
to run ITBLL by yourself and post the result here, no one can stop you doing 
this right?

{quote}
It does help but it is still making a compromise to avoid the concerns of 
2-tier. Which is why my main concern now is applying some rigor to help us come 
to a well informed decision as to wether proc v2 cannot support it and we 
should go with 1-tier. I think proc v2 is what was missing that prevented us 
from succeeding in the past although it’s possible it may not be mature enough 
at this stage.
{quote}

I'm a bit concerned. Why do you think if proc-v2 can support 2-tier than we 
should use 2-tier? Please focus on the problem of the 'master local region' 
itself? Or your point is that 'master local region' does not use 2-tier so it 
is not a good solution? This does not make sense to me.

Thanks.

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This 

[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-28 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118515#comment-17118515
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 5/28/20, 10:01 AM:


[~zhangduo][~stack] since you guys had big concerns with tiered assignment, 
would appreciate some comments in anything I might be doing undesirable in the 
PR as well as suggestions if you have any. Note the PR is a draft so there is 
likely room for improvement.  Past that would a long run of ChaosMonkey be a 
way to validate that it's stable?


was (Author: toffer):
[~zhangduo][~stack] since you guys had big concerns with tiered assignment, 
would appreciate some comments in anything I might be doing undesirable in the 
PR as well as suggestions if you have any. Past that would a long run of 
ChaosMonkey be a way to validate that it's stable?

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-28 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118504#comment-17118504
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 5/28/20, 9:54 AM:
---

{quote}
I will try to implement splittable meta by storing the root table in a 'master 
local region', on the feature branch.
{quote}
I see is the intent to do a POC with master local region?

{quote}
HBASE24389, the goal is to move the location of meta to the 'master local 
region', and also remove the assumption that there is only a single meta region 
as much as possible in code, without actually split the meta table. 
{quote}
I see, from my experience having written this patch a few times. Adding root 
and then generalizing hbase:meta  (eg moving away from single meta region) 
regions is most of the code changes, the actual splitting meta code is not a 
lot since it's basically just another region. As well as the true way to test 
if you've done your job is to actually have more than one meta so it doesn't 
really make sense to split generalizing meta handling and actually splitting 
the table there might actually be more work if you do it that way. 




was (Author: toffer):
{quote}
I will try to implement splittable meta by storing the root table in a 'master 
local region', on the feature branch.
{quote}
I see is the intent to do a POC with master local region?

{quote}
HBASE24389, the goal is to move the location of meta to the 'master local 
region', and also remove the assumption that there is only a single meta region 
as much as possible in code, without actually split the meta table. 
{quote}
I see, from my experience having written this patch a few times. Adding root 
and then generalizing hbase:meta  (eg moving away from single meta region) 
regions is most of the code changes as well as the true way to test if you've 
done your job is to actually have more than one meta so it doesn't really make 
sense to split generalizing meta handling and actually splitting the table 
there might actually be more work. 



> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-27 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118032#comment-17118032
 ] 

Michael Stack edited comment on HBASE-11288 at 5/27/20, 8:10 PM:
-

[I started a one-pager design 
doc|https://docs.google.com/document/d/1OYrCfpmmLPkSa5-AepQw8hv09zNcNPaFzSXezhcMS7E/edit#].
 Pile on. Has some Requirements and Questions only currently.


was (Author: stack):
I started a one-pager design doc. Pile on. Has some Requirements and Questions 
only currently.

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-27 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117616#comment-17117616
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 5/27/20, 10:22 AM:


Hi guys, just following up on what everyone's thoughts is for where meta region 
locations will actually be stored? Are we already sold on one direction over 
another? Do we need more discussion? Do we need  investigation to figure out 
which is better (eg POC)?  So far it's between storing it in the master and the 
root table. Correct me if I'm wrong or oversimplifying [~zhangduo] but it seems 
the main contention/concern seems to be between the complexity that tiering 
assignments add and putting the responsibility on the master? 

Also I see a meta location store for master in another feature branch 
(HBASE-11288.splittable-meta)? Is this a POC or is this the direction we are 
going? Let me know as I've rebased the root patch and been working on adding 
split meta support to HBASE-11288 feature branch.


was (Author: toffer):
Hi guys, just following up on what everyone's thoughts is for where meta region 
locations will actually be stored? Are we already sold on one direction over 
another? Do we need more discussion? Do we need  investigation to figure out 
which is better (eg POC)?  So far it's between storing it in the master and the 
root table. Correct me if I'm wrong or oversimplifying [~zhangduo] but it seems 
the main contention/concern seems to be between the complexity that tiering 
assignments add and putting the responsibility on the master? 

Also I see a meta location store for master in another feature branch 
(HBASE-11288.splittable-meta)? Is this a POC or is this the direction we are 
going? Let me know as I've rebased the root patch and working on adding split 
meta support to HBASE-11288 feature branch.

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-21 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113080#comment-17113080
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 5/21/20, 11:23 AM:


[~apurtell] thanks for the explanation, I'just have a few follow-up questions 
if thats ok.

{quote}
In contrast, what if we start exporting a subset of regionservers. How does 
that work? It's guaranteed to be different from how zk quorum strings worked.
{quote}
I pretty sure it's well thought out just understanding the thought process in 
case we need to consider creating a regionserver registry so we understand the 
tradeoff. My thinking was providing a static subset of regionservers the same 
way that's being done with providing a static list of masters. Are we not able 
to provide a static subset of masters in the current implementation? 

{quote}
 There are a lot more of them, they are randomly chosen , the connection info 
has to dynamic instead of static (how is that collected? published to 
clients?). 
{quote}
I think there are a few ways to picking the static subset. Let me know if these 
make sense. One way would be to pick regionservers that are part of the 
regionserver group hosting the meta. Another way is to provide a seed list and 
an api to allow the client to discover more if they need to someone mentioned 
that his is how cassandra does it a while back when we were takling about a 
regionserver registry. Another way is to provide just a single domain name and 
that domain name contains a static subset list of regionserver ips which the 
operator can choose to updated dynamically in case one is decomissioned or a 
new on is added (this should work for the master approach too tho you would 
need a resolver) this way the client config does not need to change.






was (Author: toffer):
[~apurtell] thanks for the explanation, I'just have a few follow-up questions 
if thats ok.

{quote}
In contrast, what if we start exporting a subset of regionservers. How does 
that work? It's guaranteed to be different from how zk quorum strings worked.
{quote}
I pretty sure it's well thought out just understanding the thought process in 
case we need to consider creating a regionserver registry so we understand the 
tradeoff. My thinking was providing a static subset of regionservers the same 
way that's being done with providing a static list of masters. Are we not able 
to provide a static subset of masters in the current implementation? 

{quote}
 There are a lot more of them, they are randomly chosen , the connection info 
has to dynamic instead of static (how is that collected? published to 
clients?). 
{quote}
I think there are many ways to picking the static subset. One way would be to 
pick regionservers that are part of the regionserver group hosting the meta. 
Another way is to provide a seed list and an api to allow the client to 
discover more if they need to someone mentioned that his is how cassandra does 
it a while back when we were takling about a regionserver registry. Another way 
is to provide just a single domain name and that domain name contains a static 
subset list of regionserver ips which the operator can choose to updated 
dynamically in case one is decomissioned or a new on is added (this should work 
for the master approach too tho you would need a resolver) this way the client 
config does not need to change.





> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-21 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113085#comment-17113085
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 5/21/20, 11:11 AM:


{quote}
At least we have to provide an API to fecth all the content in root table, as 
the cache service needs it. So for HBCK it could also uses this API. But this 
API will not be part of the client facing API, for client they just need a 
simple locateMeta method.
{quote}
I see. You would need mutation apis as well to correct the errors.


was (Author: toffer):
{quote}
At least we have to provide an API to fecth all the content in root table, as 
the cache service needs it. So for HBCK it could also uses this API. But this 
API will not be part of the client facing API, for client they just need a 
simple locateMeta method.
{quote}
I see. You would need mutation apis as well.

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-20 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112530#comment-17112530
 ] 

Andrew Kyle Purtell edited comment on HBASE-11288 at 5/20/20, 7:19 PM:
---

bq. I agree publishing all the regionserver locations wouldn't make sense, but 
I'm curious and probably missing something as to why can't we publish a subset 
of regionserver locations? 

Because we also want migrating service discovery and configuration 
considerations to closely align, to avoid having to rethink how clients 
discover cluster service. Right now operators have to pass around a string of 
zookeeper endpoints. As you know zookeeper quorums are limited in size by how 
they scale writes. And, in many deployments, ZK and masters might be colocated, 
because they are both _meta_ services (metadata and coordination), and their 
deployments have the same shape both in terms of number of instances and 
placement with respect to failure domains. Sum these considerations, it is 
highly advantageous if you are managing a string of ZK endpoints to bootstrap 
clients today, then tomorrow you have the option of simply replacing the string 
of zk service locations with a string of master service locations. Same order 
of scale. Probably, same failure domain engineering. Possibly, colocation. It 
might be the same list of hostnames served to clients the same way the zk 
quorum string was previously served, with only the port numbers changing.

In contrast, what if we start exporting a subset of regionservers. How does 
that work? It's guaranteed to be different from how zk quorum strings worked. 
There are a lot more of them, they are randomly chosen (?), the connection info 
has to dynamic instead of static (how is that collected? published to 
clients?). I could go on. The alternative, described above, has been well 
thought out, IMHO. 


was (Author: apurtell):
bq. I agree publishing all the regionserver locations wouldn't make sense, but 
I'm curious and probably missing something as to why can't we publish a subset 
of regionserver locations? 

Because we also want migrating service discovery and configuration 
considerations to closely align, to avoid having to rethink how clients 
discover cluster service. Right now operators have to pass around a string of 
zookeeper endpoints. As you know zookeeper quorums are limited in size by how 
they scale writes. And, in many deployments, ZK and masters might be colocated, 
because they are both _meta_ services (metadata and coordination), and their 
deployments have the same shape both in terms of number of instances and 
placement with respect to failure domains. Sum these considerations, it is 
highly advantageous if you have a string of ZK endpoints today, and tomorrow 
you simply replace this string with a string of master endpoints. Probably its 
the same list of hostnames served to clients the same way the zk quorum string 
was previously served, with only the port numbers changing.

In contrast, what if we start exporting a subset of regionservers. How does 
that work? It's guaranteed to be different from how zk quorum strings worked. 
There are a lot more of them, they are randomly chosen (?), the connection info 
has to dynamic instead of static (how is that collected? published to 
clients?). I could go on. The alternative, described above, has been well 
thought out, IMHO. 

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-20 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112530#comment-17112530
 ] 

Andrew Kyle Purtell edited comment on HBASE-11288 at 5/20/20, 7:17 PM:
---

bq. I agree publishing all the regionserver locations wouldn't make sense, but 
I'm curious and probably missing something as to why can't we publish a subset 
of regionserver locations? 

Because we also want migrating service discovery and configuration 
considerations to closely align, to avoid having to rethink how clients 
discover cluster service. Right now operators have to pass around a string of 
zookeeper endpoints. As you know zookeeper quorums are limited in size by how 
they scale writes. And, in many deployments, ZK and masters might be colocated, 
because they are both _meta_ services (metadata and coordination), and their 
deployments have the same shape both in terms of number of instances and 
placement with respect to failure domains. Sum these considerations, it is 
highly advantageous if you have a string of ZK endpoints today, and tomorrow 
you simply replace this string with a string of master endpoints. Probably its 
the same list of hostnames served to clients the same way the zk quorum string 
was previously served, with only the port numbers changing.

In contrast, what if we start exporting a subset of regionservers. How does 
that work? It's guaranteed to be different from how zk quorum strings worked. 
There are a lot more of them, they are randomly chosen (?), the connection info 
has to dynamic instead of static (how is that collected? published to 
clients?). I could go on. The alternative, described above, has been well 
thought out, IMHO. 


was (Author: apurtell):
bq. I agree publishing all the regionserver locations wouldn't make sense, but 
I'm curious and probably missing something as to why can't we publish a subset 
of regionserver locations? 

Because we also want migrating service discovery and configuration 
considerations to closely align, to avoid having to rethink how clients 
discover cluster service. Right now operators have to pass around a string of 
zookeeper endpoints. As you know zookeeper quorums are limited in size by how 
they scale writes. And, in many deployments, ZK and masters might be colocated, 
because they are both metadata services, and their deployments have the same 
shape both in terms of number of instances and placement with respect to 
failure domains. Sum these considerations, it is highly advantageous if you 
have a string of ZK endpoints today, and tomorrow you simply replace this 
string with a string of master endpoints. Probably its the same list of 
hostnames served to clients the same way the zk quorum string was previously 
served, with only the port numbers changing.

In contrast, what if we start exporting a subset of regionservers. How does 
that work? It's guaranteed to be different from how zk quorum strings worked. 
There are a lot more of them, they are randomly chosen (?), the connection info 
has to dynamic instead of static (how is that collected? published to 
clients?). I could go on. The alternative, described above, has been well 
thought out, IMHO. 

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-20 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112539#comment-17112539
 ] 

Andrew Kyle Purtell edited comment on HBASE-11288 at 5/20/20, 7:03 PM:
---

bq. But anyway, the ConnectionRegsitry is pluggable, so for users who can not 
control things other than HBase, they could use the new registry implementation 
in HBASE-18095 to reduce the load of zookeeper, 

[~zhangduo] If we are going to discuss this in this level of detail, I feel I 
have to enumerate why we built it, which doesn't have anything to do with load 
per se, just to clarify:
- For configuring for fail fast, having to think about zk connection 
configuration particulars in addition to HBase RPC configuration is doable, but 
clumsy, and limiting, and not always done correctly. It matters when you 
operate at scale and have a number of different internal customers with 
different expectations about retry or fail-fast behavior. A monolithic deploy / 
service organization may not have this, which is fine, it's optional. 
- It's a security problem that zk is exposed to clients. ZK's security model is 
problematic. (3.5 and up can be better with TLS, requiring successful client 
and server cert-based auth before accepting any requests, but we don't support 
the ZK TLS transport out of the box actually.) For operators with this concern, 
now they can isolate the ZK service from end users with network or host ACLs, 
and HBase can service those clients still. 


was (Author: apurtell):
bq. But anyway, the ConnectionRegsitry is pluggable, so for users who can not 
control things other than HBase, they could use the new registry implementation 
in HBASE-18095 to reduce the load of zookeeper, 

[~zhangduo] We are going to discuss this in this level of detail, I feel I have 
to enumerate why we built it, which doesn't have anything to do with load per 
se, just to clarify:
- For configuring for fail fast, having to think about zk connection 
configuration particulars in addition to HBase RPC configuration is doable, but 
clumsy, and limiting, and not always done correctly. It matters when you 
operate at scale and have a number of different internal customers with 
different expectations about retry or fail-fast behavior. A monolithic deploy / 
service organization may not have this, which is fine, it's optional. 
- It's a security problem that zk is exposed to clients. ZK's security model is 
problematic. (3.5 and up can be better with TLS, requiring successful client 
and server cert-based auth before accepting any requests, but we don't support 
the ZK TLS transport out of the box actually.) For operators with this concern, 
now they can isolate the ZK service from end users with network or host ACLs, 
and HBase can service those clients still. 

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-20 Thread Andrew Kyle Purtell (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112539#comment-17112539
 ] 

Andrew Kyle Purtell edited comment on HBASE-11288 at 5/20/20, 7:03 PM:
---

bq. But anyway, the ConnectionRegsitry is pluggable, so for users who can not 
control things other than HBase, they could use the new registry implementation 
in HBASE-18095 to reduce the load of zookeeper, 

[~zhangduo] If we are going to discuss this in this level of detail, I feel I 
have to enumerate why we built it, which doesn't have anything to do with load 
per se, just to clarify:
- For configuring for fail fast, having to think about zk connection 
configuration particulars in addition to HBase RPC configuration is doable, but 
clumsy, and limiting, and not always done correctly. It matters when you 
operate at scale and have a number of different internal customers with 
different expectations about retry or fail-fast behavior. A monolithic deploy / 
service organization may not have this concern, which is fine, it's optional. 
- It's a security problem that zk is exposed to clients. ZK's security model is 
problematic. (3.5 and up can be better with TLS, requiring successful client 
and server cert-based auth before accepting any requests, but we don't support 
the ZK TLS transport out of the box actually.) For operators with this concern, 
now they can isolate the ZK service from end users with network or host ACLs, 
and HBase can service those clients still. 


was (Author: apurtell):
bq. But anyway, the ConnectionRegsitry is pluggable, so for users who can not 
control things other than HBase, they could use the new registry implementation 
in HBASE-18095 to reduce the load of zookeeper, 

[~zhangduo] If we are going to discuss this in this level of detail, I feel I 
have to enumerate why we built it, which doesn't have anything to do with load 
per se, just to clarify:
- For configuring for fail fast, having to think about zk connection 
configuration particulars in addition to HBase RPC configuration is doable, but 
clumsy, and limiting, and not always done correctly. It matters when you 
operate at scale and have a number of different internal customers with 
different expectations about retry or fail-fast behavior. A monolithic deploy / 
service organization may not have this, which is fine, it's optional. 
- It's a security problem that zk is exposed to clients. ZK's security model is 
problematic. (3.5 and up can be better with TLS, requiring successful client 
and server cert-based auth before accepting any requests, but we don't support 
the ZK TLS transport out of the box actually.) For operators with this concern, 
now they can isolate the ZK service from end users with network or host ACLs, 
and HBase can service those clients still. 

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-20 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112035#comment-17112035
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 5/20/20, 10:33 AM:


{quote}
The solution is to make meta splittable... But this is not possible for root 
table right?
{quote}
Right but there's being able to scale and there's getting hammered/spammed. For 
the latter you could still run into issues and still need a solution. 

{quote}
Well, for me I would say they are not the same... The location for root is on 
zk while meta is in root, and root can not be split but meta can...
{quote}
True but they are both tables. Operationally that makes things more intuitive. 
Eg load balancing across servers etc.




was (Author: toffer):
{quote}
The solution is to make meta splittable... But this is not possible for root 
table right?
{quote}
There's being able to scale and there's getting hammered/spammed. For the 
latter you could still run into issues and still need a solution. 

{quote}
Well, for me I would say they are not the same... The location for root is on 
zk while meta is in root, and root can not be split but meta can...
{quote}
True but they are both tables. Operationally that makes things more intuitive. 
Eg load balancing across servers etc.



> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-20 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112028#comment-17112028
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 5/20/20, 10:25 AM:


{quote}
I think we should figure out how this will work. Can be done in parallel in 
another issue. Baseline is old clients keep working.
{quote}
Yep let's make backward compatibility work. The simplest would be to just proxy 
requests like rest or thrift server does now. If a request for meta hits the 
regionserver hosting root. Should be fairly straightforward. Yeah another 
branch on top of this one maybe just to not muck things up as we are trying 
stuff.

{quote}
The tiering of assignment – root going out before anything else, then all of 
the meta regions and then user-space regions. And that ROOT is just so ugly... 
the way the rows are done; the crazy comparator.
{quote}
Tiering - yeah this is a tough one since root holds meta locationsit 
invariable requires root to be available for writing. What is your concern 
here...the complexity of the process? 

Ugly root - The way the row keys are done?  The nested thing? It's not that bad 
at least the logic is being reused. Tho we probably don't need the 
"hbase:meta," prefix on all rowstho I"m not sure what we gain?

{quote}
Currently we have meta and namespace and acls and we still don't get it right 
(Duo helped us out here by merging namespace into meta table in master branch).
{quote}
ACLs I've never had an issue with? Namespace AFAIK needed attention and got 
kicked around a bit? It's better it's in hbase:meta now...no one will ignore 
that table as well as less specialized code. This is a bit different from 
tiering assignment issue tho?

{quote}
We should do up a bit of a doc on how this will work; how it will work for old 
and new clients (e.g. handing out the whole meta table when client does meta 
location lookup); how the proxying will be done so old clients will work when 
meta splits, etc.
{quote}
Sounds good. Do you want to brainstorm? Or should I write a proposal then we 
work on it?




was (Author: toffer):
{quote}
I think we should figure out how this will work. Can be done in parallel in 
another issue. Baseline is old clients keep working.
{quote}
Yep let's make backward compatibility work. The simplest would be to just proxy 
requests like rest or thrift server does now. If a request for meta hits the 
regionserver hosting root. Should be fairly straightforward. Yeah another 
branch on top of this one maybe just to not muck things up as we are trying 
stuff.

{quote}
The tiering of assignment – root going out before anything else, then all of 
the meta regions and then user-space regions. And that ROOT is just so ugly... 
the way the rows are done; the crazy comparator.
{quote}
Tiering - yeah this is a tough one since root holds meta locationsit 
invariable requires root to be available for writing. What is your concern 
here...the complexity of the process? 

Ugly root - The way the row keys are done?  The nested thing? It's not that bad 
at least the logic is being reused. Tho we probably don't need the 
"hbase:meta," prefix on all rowstho I"m not sure what we gain?

{quote}
Currently we have meta and namespace and acls and we still don't get it right 
(Duo helped us out here by merging namespace into meta table in master branch).
{quote}
ACLs I've never had an issue with? Namespace AFAIK needed attention and got 
kicked around a bit? It's better it's in hbase:meta now...no one will ignore 
that table. This is a bit different from tiering assignment issue tho?

{quote}
We should do up a bit of a doc on how this will work; how it will work for old 
and new clients (e.g. handing out the whole meta table when client does meta 
location lookup); how the proxying will be done so old clients will work when 
meta splits, etc.
{quote}
Sounds good. Do you want to brainstorm? Or should I write a proposal then we 
work on it?



> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-20 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111992#comment-17111992
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 5/20/20, 9:57 AM:
---

{quote}
Just do not see any difference if we just have a single root region on a region 
server? For all the cases, the client will also harmmering the region server 
which hosts the root region and bring down all the cluster...
{quote}

IMHO this type of workload is a regionserver responsibility and hence there are 
advantages to it being done on the regionserver. 

1. AFAIK hbase:meta today is getting hammered? If so, we'll likely want to fix 
that if we haven't already? If so then whatever fix/enhancement we did the root 
table can take advantage of as well? In which case it gives simplicity and code 
reuse as we are applying one solution to two problems. Although we'd have to 
add code for root getting assigned first etc, which I believe should be 
relatively straightforward with procedures.

On the other hand if use the master and backup masters we will be giving them a 
new specialized responsibility, which will introduce new specialized code. And 
then we would need to introduce more code for the master(s) to not get 
hammered. 

2. From an operations perspective it would be easier for operations folks to 
reason about things as the way both catalog tiers are handled are the same and 
so the way to manage them if there are issues are the same.

3. In recovery scenarios the master is already busy doing it's master work, so 
it would be better if it could focus it's resources on that as it does not 
horizontally scale.

Please correct me if I'm making any wrong assumptions here. Let me know what 
you think? 





was (Author: toffer):
{quote}
Just do not see any difference if we just have a single root region on a region 
server? For all the cases, the client will also harmmering the region server 
which hosts the root region and bring down all the cluster...
{quote}

IMHO this type of workload is a regionserver responsibility and hence there are 
advantages to it being done on the regionserver. 

1. AFAIK hbase:meta today is getting hammered? If so, we'll likely want to fix 
that if we haven't already? If so then whatever fix/enhancement we did the root 
table can take advantage of as well? In which case it gives simplicity and code 
reuse as we are applying one solution to two problems. Although we'd have to 
add code for root getting assigned first etc, which I believe should be 
relatively straightforward with procedures.

On the other hand if use the master and backup masters we will be giving them a 
new specialized responsibility, which will introduce new specialized code. And 
then we would need to introduce more code for the master(s) to not get 
hammered. 

2. From an operations perspective it would be easier for operations folks to 
reason about things as the way both catalog tiers are handled are the same and 
so the way to manage them if there are issues are the same.

Please correct me if I'm making any wrong assumptions here. Let me know what 
you think? 




> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-19 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111060#comment-17111060
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 5/19/20, 10:42 AM:


{quote}
We store the procedure data in a local HRegion on master.
{quote}
I see thanks for the explanation and the link.

{quote}
And on client accessing meta, I think for root region it is fine. We do not 
mirror all the requests to region servers to master, when will only contact 
master when locating meta, which should be rarely happen. 
{quote}
I can think of a couple of cases were it might not be good/ideal:
1. Large batch jobs that does lookups on an hbase table in tasks would hit it 
on startup. Same for large application restarts (eg storm topologies, etc).
2. In the event of failure degradation is not as graceful, as if clients are 
not able to find meta  regions they will start hammering the master when the 
master is already busy trying to recover the cluster.
3. Generally just misbehaving actors (eg poorly written applications) or buggy 
third party client implementations that users attempt to use. 

{quote}
And now we even have a MasterRegistry on 2.x, where we move the locating meta 
requests from zk to master.
{quote}
I think what we might do is add a resolver plugin so all master registry 
requests just go to the backup masters. I've not looked at the code yet but 
maybe we can have the backup masters serve out the local hregion content from 
the backup masters?  IMHO ideally  the registry should've been served out of 
the regionservers. Today we have a "system" regionserver group where all the 
system tables served out of (including root). With the new setup we would have 
a system group and an extra set of master servers for serving out root and 
registry stuff and the extra operational nuance of backup masters not just 
being backup but doing active work for the system. 

Just laying out my concerns let me know what you guys think?  Let me know if 
it's reasonable/unreasonable.



was (Author: toffer):
{quote}
We store the procedure data in a local HRegion on master.
{quote}
I see thanks for the explanation and the link.

{quote}
And on client accessing meta, I think for root region it is fine. We do not 
mirror all the requests to region servers to master, when will only contact 
master when locating meta, which should be rarely happen. 
{quote}
I can think of a couple of cases were it might not be good/ideal:
1. Large batch jobs that does lookups on an hbase table in tasks would hit it 
on startup. Same for large application restarts (eg storm topologies, etc).
2. In the event of failure degradation is not as graceful, as if clients are 
not able to find meta  regions they will start hammering the master when the 
master is already busy trying to recover the cluster.
3. Generally just misbehaving actors (eg poorly written applications) or buggy 
third party client implementations that users attempt to use. 

{quote}
And now we even have a MasterRegistry on 2.x, where we move the locating meta 
requests from zk to master.
{quote}
I think what we might do is add a resolver plugin so all master registry 
requests just go to the backup masters. I've not looked at the code yet but 
maybe we can have the backup masters serve out the local hregion content from 
the backup masters?  IMHO ideally  the registry should've been served out of 
the regionservers. Today we have a "system" regionserver group where all the 
system tables served out of (including root). With the new setup we would have 
a system group and an extra set of master servers for serving out root and 
registry stuff and the extra nuance of backup masters not just being backup but 
doing active work for the system. 

Just laying out my concerns let me know what you guys think?  


> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-05-18 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110755#comment-17110755
 ] 

Michael Stack edited comment on HBASE-11288 at 5/19/20, 1:20 AM:
-


Can we split meta in a way such that older clients keep working? Perhaps old 
clients get a proxying Region that fields requests to the actual split meta 
table so they keep working?

Can we NOT do a ROOT table w/ its crazy meta-meta comparator?


Yeah, when registry is hosted by Master, master now carries some location ops 
we used farm out to zk. Master-hosted registry could host Meta table Region 
locations too... seems apt for a Registry. We've not been good at tier's of 
assign; i.e. ROOT first, then META regions.


was (Author: stack):

Can we split meta in a way such that older clients keep working? Perhaps old 
clients get a proxying Region that fields requests to the actual split meta 
table so they keep working?

Can we NOT do a ROOT table w/ its crazy meta-meta comparator?


Yeah, when registry is hosted by Master, master now carries some location ops 
we used farm out to zk. Master-hosted registry could host Meta table Region 
locations too... seems apt for a Registry.

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Umbrella
>  Components: meta
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2020-04-02 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073594#comment-17073594
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 4/2/20, 10:35 AM:
---

Hi, I pushed up my current WIP/Draft patch for feedback as  PR on 
[github|https://github.com/apache/hbase/pull/1418]. Please let me know what you 
guys think.


was (Author: toffer):
Hi, I pushed up my current WIP/Draft patch for feedback as  PR on 
[github|https://github.com/apache/hbase/pull/1418]. 

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2019-10-01 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942317#comment-16942317
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 10/1/19 9:11 PM:
--

{quote}
I'd like to see old clients keep working though meta has split under them. 
Perhaps the serverName in the MetaRegionServer znode proxies requests when the 
region asked for is named 'hbase:meta,,1.1588230740' to actual meta table whose 
regions could be elsewhere on the cluster. I suppose 'hbase:meta,,1.1588230740' 
would have to be the old ROOT too?
{quote}
Oh interesting idea. How about instead of proxying when a requests for meta 
hits the RS with the ROOT region, we could throw a RegionMovedException to 
redirect the client to the proper meta region RS . On the RS holding the meta 
region we translate the 'hbase:meta,,1.1588230740' region info to the 
appropriate meta region info if split. This way we can avoid proxying which 
seems to make things a bit more complicated?

{quote}
Lets do a one-pager.
{quote}
Sounds good.


was (Author: toffer):
{quote}
I'd like to see old clients keep working though meta has split under them. 
Perhaps the serverName in the MetaRegionServer znode proxies requests when the 
region asked for is named 'hbase:meta,,1.1588230740' to actual meta table whose 
regions could be elsewhere on the cluster. I suppose 'hbase:meta,,1.1588230740' 
would have to be the old ROOT too?
{quote}
Oh interesting idea. How about instead of proxying we could throw a 
RegionMovedException to redirect to the proper meta region RS when a requests 
for meta hits the RS with the ROOT region. On the RS holding the meta region we 
translate the 'hbase:meta,,1.1588230740' region info to the appropriate meta 
region info if split. This way we can avoid proxying which seems to make things 
a bit more complicated?

{quote}
Lets do a one-pager.
{quote}
Sounds good.

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2019-10-01 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942317#comment-16942317
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 10/1/19 9:10 PM:
--

{quote}
I'd like to see old clients keep working though meta has split under them. 
Perhaps the serverName in the MetaRegionServer znode proxies requests when the 
region asked for is named 'hbase:meta,,1.1588230740' to actual meta table whose 
regions could be elsewhere on the cluster. I suppose 'hbase:meta,,1.1588230740' 
would have to be the old ROOT too?
{quote}
Oh interesting idea. How about instead of proxying we could throw a 
RegionMovedException to redirect to the proper meta region RS when a requests 
for meta hits the RS with the ROOT region. On the RS holding the meta region we 
translate the 'hbase:meta,,1.1588230740' region info to the appropriate meta 
region info if split. This way we can avoid proxying which seems to make things 
a bit more complicated?

{quote}
Lets do a one-pager.
{quote}
Sounds good.


was (Author: toffer):
{quote}
I'd like to see old clients keep working though meta has split under them. 
Perhaps the serverName in the MetaRegionServer znode proxies requests when the 
region asked for is named 'hbase:meta,,1.1588230740' to actual meta table whose 
regions could be elsewhere on the cluster. I suppose 'hbase:meta,,1.1588230740' 
would have to be the old ROOT too?
{quote}
Oh interesting idea. How about instead of proxying we could throw a 
RegionMovedException to the proper meta region destination when a requests for 
meta hits the RS with the ROOT region. On the RS holding the meta region we 
translate the 'hbase:meta,,1.1588230740' region info to the appropriate meta 
region info if split. This way we can avoid proxying which seems to make things 
a bit more complicated?

{quote}
Lets do a one-pager.
{quote}
Sounds good.

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2019-09-30 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941446#comment-16941446
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 10/1/19 1:21 AM:
--

{quote}
How many rows in your biggest hbase:root? And how many regions in hbase:meta 
have you gotten to? 
{quote}
We have a cluster that has 1.04 million regions. That same cluster has 119 
hbase:meta regions. We configured the split size to be small.  

{quote}
A lot has changed since 1.3 but patch will help.
{quote}
Yes, it shows two important things: 1. The approach of adding a splittable 
hbase:meta is fairly straightforward and can be applied to other branches (even 
tho a lot has changed), 2. The approach works :-). Next question is where do we 
go from here? A patch for master?


was (Author: toffer):
{quote}
How many rows in your biggest hbase:root? And how many regions in hbase:meta 
have you gotten to? 
{quote}
We have a cluster that has 1.04 million regions. That same cluster has 119 
hbase:meta regions. We configured the split size to be small.  

{quote}
A lot has changed since 1.3 but patch will help.
{quote}
Yes, it shows two important things: 1. The approach of adding hbase:meta is 
fairly straightforward and can be applied to other branches (even tho a lot has 
changed), 2. The approach works :-). Next question is where do we go from here? 
A patch for master?

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2019-09-30 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940663#comment-16940663
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 9/30/19 6:06 AM:
--

Apologies for the delay here's a short 
[writeup|https://docs.google.com/document/d/1cs_sRC5xbK2JPdw99kqUqUGMJ0yHier5QPx21al619g/edit].
 Let me know if it needs more detail or feel free to comment. BTW I had to 
rebase the branch.

 

 


was (Author: toffer):
Apologies for the delay here's a short 
[writeup|[https://docs.google.com/document/d/1cs_sRC5xbK2JPdw99kqUqUGMJ0yHier5QPx21al619g/edit]]
 . Let me know if it needs more detail or feel free to comment. BTW I had to 
rebase the branch.

 

 

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2019-09-30 Thread Francis Christopher Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940663#comment-16940663
 ] 

Francis Christopher Liu edited comment on HBASE-11288 at 9/30/19 6:04 AM:
--

Apologies for the delay here's a short 
[writeup|[https://docs.google.com/document/d/1cs_sRC5xbK2JPdw99kqUqUGMJ0yHier5QPx21al619g/edit]]
 . Let me know if it needs more detail or feel free to comment. BTW I had to 
rebase the branch.

 

 


was (Author: toffer):
Apologies for the delay here's a short 
[writeup|[https://docs.google.com/document/d/1cs_sRC5xbK2JPdw99kqUqUGMJ0yHier5QPx21al619g/edit]].
 Let me know if it needs more detail or feel free to comment. BTW I had to 
rebase the branch.

 

 

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Francis Christopher Liu
>Assignee: Francis Christopher Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-11288) Splittable Meta

2019-09-24 Thread Francis Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-11288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16936585#comment-16936585
 ] 

Francis Liu edited comment on HBASE-11288 at 9/24/19 9:16 AM:
--

Here's a WIP branch:

[https://github.com/francisliu/hbase/tree/apache_1.3_splitmeta]

I'm working on stripping down most of the changes to only what's needed to 
split meta. The rest can come as follow-ons if needed.  At it's core there's 
actually not much change, the patch is partly big because there's a lot of 
renames which are touching a lot of files. So with this branch we can get a 
very good picture of the changes needed. I'm still running the unit tests. Will 
move this to a feature branch in the hbase repo as soon as most of the tests 
are passing. I'll add more info tomorrow.

Next question is what we want to do. Should I get everything we want working on 
the 1.3 branch or should I port this to trunk or 1.x or?

 


was (Author: toffer):
Here's a WIP branch:

[https://github.com/francisliu/hbase/tree/apache_1.3_splitmeta]

I'm working on stripping down most of the changes to only what's needed to 
split meta. The rest can come as follow-ons if needed.  At it's core there's 
actually not much change, the patch is partly big because there's a lot of 
renames which are touching a lot of files. So with this branch we can get a 
very good picture of the changes needed. I'm still running the unit tests. Will 
move this to a feature branch in the hbase repo as soon as most of the tests 
are passing. 

Next question is what we want to do. Should I get everything we want working on 
the 1.3 branch or should I port this to trunk or 1.x or?

 

> Splittable Meta
> ---
>
> Key: HBASE-11288
> URL: https://issues.apache.org/jira/browse/HBASE-11288
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Francis Liu
>Assignee: Francis Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)