[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179617#comment-17179617 ] Guanghao Zhang commented on HBASE-23035: {quote}[https://github.com/apache/hbase/blob/c2e0cf989e4a86169219161d4d889db80288e636/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L556] {quote} Introduced by HBASE-18036. I am +1 to add a config to control this. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179548#comment-17179548 ] Anoop Sam John commented on HBASE-23035: Ya I have seen this retainAssign stuff in 1.x based clusters. What I mean is in some parts of AM, we have configs which takes whether locality is to be considered and so calc based on that.. So ya a locality sensitive cluster can have such a config (new may be) turned ON. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179517#comment-17179517 ] Guanghao Zhang commented on HBASE-23035: {quote} IMO, we should make retain assignment configurable, so that user can decide as per their use case. {quote} Ok. If you guys need this feature, please open new issue for this. Thanks. {quote}But currently branch-1 retains the assinement during SCP if same RS came up (please correct me if I miss something), {quote} Let me take a look about this. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17179347#comment-17179347 ] Pankaj Kumar commented on HBASE-23035: -- {quote}This jira change the "retain" assignment to round-robin assignment, which is same with 1.x.x version. This change will make the failover faster and improve availability. {quote} Yeah [~zghao] , failover will be faster and regions will be available soon but this will impact the scan performance in non-cloud scenario. But currently branch-1 retains the assinement during SCP if same RS came up (please correct me if I miss something), [https://github.com/apache/hbase/blob/c2e0cf989e4a86169219161d4d889db80288e636/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L556] IMO, we should make retain assignment configurable, so that user can decide s per their use case. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178685#comment-17178685 ] Bo Cui commented on HBASE-23035: [https://github.com/apache/hbase/blob/c2e0cf989e4a86169219161d4d889db80288e636/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ServerCrashProcedure.java#L556] [~anoop.hbase] u are talking about it? > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176848#comment-17176848 ] Anoop Sam John commented on HBASE-23035: [~Bo Cui] So ur usecase is entire cluster restart and as part of that u want the regions to come back to the old RSs itself (as much as possible). So locality can be preserved. There is a some config around the LB which takes whether to consider the data locality aspect in deciding the plan. Can we make use of the same thing .. May be not.. I dont remember details on that conf. [~zghao] > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176744#comment-17176744 ] Bo Cui commented on HBASE-23035: [~zghao] During startup, hbase needs to assign region to previous rs without affecting the scan performance, so we can add conf to solve this problem > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175295#comment-17175295 ] Guanghao Zhang commented on HBASE-23035: This jira change the "retain" assignment to round-robin assignment, which is same with 1.x.x version. This change will make the failover faster and improve availability. {quote}Or are there some configs to control this? {quote} No configs to control this. The new behavior is the default behavior. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175277#comment-17175277 ] Anoop Sam John commented on HBASE-23035: [~zghao].. So how is the fix now? The SSH will round robin the assignment even if the down RS came back by the time the AM start its work? Or are there some configs to control this? Or any other way? Sorry did not see the patch > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175267#comment-17175267 ] Guanghao Zhang commented on HBASE-23035: {quote}hi, but some hbase cluster is not on the cloud, {quote} Yes. Here's core problem is the slow failover if you assign all regions to one regionserver. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174189#comment-17174189 ] Bo Cui commented on HBASE-23035: {quote}And the locality is not big deal when deploy HBase on cloud. {quote} [~zghao] hi, but some hbase cluster is not on the cloud, > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940573#comment-16940573 ] Hudson commented on HBASE-23035: Results for branch master [build #1487 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1487/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/1487//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1487//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1487//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940194#comment-16940194 ] Hudson commented on HBASE-23035: Results for branch branch-2.2 [build #644 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/644/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/644//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/644//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/644//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940178#comment-16940178 ] Hudson commented on HBASE-23035: Results for branch branch-2 [build #2305 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2305/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2305//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2305//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2305//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16940117#comment-16940117 ] Hudson commented on HBASE-23035: Results for branch branch-2.1 [build #1648 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1648/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1648//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1648//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1648//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938247#comment-16938247 ] Guanghao Zhang commented on HBASE-23035: Ping [~ram_krish] for reviewing [https://github.com/apache/hbase/pull/652] > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938244#comment-16938244 ] Guanghao Zhang commented on HBASE-23035: Open a new issue HBASE-23078 for LoadBalancer. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938176#comment-16938176 ] Guanghao Zhang commented on HBASE-23035: There are two problems about the LoadBalancer. 1. The cluster means the cluster state of the whole cluster. But hasRegionReplica is false, so it only create clusterstate by the regions which need to assign, not the whole cluster... {code:java} Cluster cluster = createCluster(servers, regions, false); List unassignedRegions = new ArrayList<>(); roundRobinAssignment(cluster, regions, unassignedRegions, servers, assignments); protected Cluster createCluster(List servers, Collection regions, boolean hasRegionReplica) { // Get the snapshot of the current assignments for the regions in question, and then create // a cluster out of it. Note that we might have replicas already assigned to some servers // earlier. So we want to get the snapshot to see those assignments, but this will only contain // replicas of the regions that are passed (for performance). Map> clusterState = null; if (!hasRegionReplica) { clusterState = getRegionAssignmentsByServer(regions); } else { // for the case where we have region replica it is better we get the entire cluster's snapshot clusterState = getRegionAssignmentsByServer(null); }for (ServerName server : servers) { if (!clusterState.containsKey(server)) { clusterState.put(server, EMPTY_REGION_LIST); } } return new Cluster(regions, clusterState, null, this.regionFinder, rackManager); } {code} 2. wouldLowerAvailability method only consider the primary regions. The replica region can't assign to same server with primary region. But can be assigned to same server with other replica regions. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936684#comment-16936684 ] Guanghao Zhang commented on HBASE-23035: There may have bug in StochasticLoadBalancer. I tried to use it to balance cluster. But it cannot assign replica to different servers... But anyway this should be another issue. Let me dig more. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936501#comment-16936501 ] Guanghao Zhang commented on HBASE-23035: {quote}bq. So on restart you just want to leave the target location as null and allow the LB to take care of the location - right? {quote} Yes. The job should done by LB. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936390#comment-16936390 ] ramkrishna.s.vasudevan commented on HBASE-23035: So on restart you just want to leave the target location as null and allow the LB to take care of the location - right? > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936247#comment-16936247 ] stack commented on HBASE-23035: --- [~zghao] HBASE-18946 is long time ago. Sorry if we messed it up. Thanks for finding hole in it. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935654#comment-16935654 ] Duo Zhang commented on HBASE-23035: --- I think this should be done in balancer. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935638#comment-16935638 ] Guanghao Zhang commented on HBASE-23035: Ping [~ram_krish] [~anoop.hbase] [~stack] [~zhangduo] Any ideas? > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935637#comment-16935637 ] Guanghao Zhang commented on HBASE-23035: Read HBASE-18946 again. So the initial problem is "Region replicas should be assigned to different servers". But the fix looks not good in HBASE-18946. It tried to round-robin assign when create table with region replica. And retain the old location for region replica when failover. But load balancer still have change to break this and assign region replica to same server? I thought the initial problem should be fixed by load balancer. And the failover no need retain the old deployment. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935539#comment-16935539 ] Guanghao Zhang commented on HBASE-23035: Open a PR#652 for addendum patch. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935500#comment-16935500 ] Guanghao Zhang commented on HBASE-23035: I found there are unit test called TestRetainAssignmentOnRestart. But not failed on branch-2.2+. Let me dig more... > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935374#comment-16935374 ] Hudson commented on HBASE-23035: Results for branch branch-2 [build #2295 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2295/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2295//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2295//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2295//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935335#comment-16935335 ] Hudson commented on HBASE-23035: Results for branch branch-2.2 [build #634 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/634/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/634//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/634//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/634//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935311#comment-16935311 ] Hudson commented on HBASE-23035: Results for branch master [build #1474 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1474/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/1474//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1474//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1474//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935291#comment-16935291 ] Guanghao Zhang commented on HBASE-23035: Create a new PR#650 for branch-2.1. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935042#comment-16935042 ] Guanghao Zhang commented on HBASE-23035: Pushed to branch-2.2+. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931235#comment-16931235 ] Guanghao Zhang commented on HBASE-23035: bq. We were always doing a round robin method in case of SCP right? I mean for non region replica cases? After HBASE-18946, it will retain the regions to the old RS even no region replica. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931193#comment-16931193 ] ramkrishna.s.vasudevan commented on HBASE-23035: We were always doing a round robin method in case of SCP right? I mean for non region replica cases? > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931188#comment-16931188 ] Guanghao Zhang commented on HBASE-23035: bq. So the server which went down and immediate came back will have lesser regions comparably right? Yes. bq. So how/whether that will impact the cluster later? The cluster will balance again by running balancer. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931181#comment-16931181 ] Anoop Sam John commented on HBASE-23035: So the server which went down and immediate came back will have lesser regions comparably right? In effect some of its old regions got moved to other live RSs. So how/whether that will impact the cluster later? What will happen with the later balance happening? > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931047#comment-16931047 ] Guanghao Zhang commented on HBASE-23035: bq. You mean the RS is back online immediately? Yes bq. the default behavior in the old time, is to assign regions to all the live servers to make them online soon... Yes. This issue plan to change to the old behavior. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931025#comment-16931025 ] Duo Zhang commented on HBASE-23035: --- You mean the RS is back online immediately? If the RS is dead we will not assign regions to it... And IIRC, the default behavior in the old time, is to assign regions to all the live servers to make them online soon... > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.2#803003)