[jira] [Commented] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate region count distribution
[ https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579325#comment-17579325 ] Clara Xiong commented on HBASE-25625: - Oh yes slop was re-introduced and could be used here. Just need more validation in real-life scenarios. > StochasticBalancer CostFunctions needs a better way to evaluate region count > distribution > - > > Key: HBASE-25625 > URL: https://issues.apache.org/jira/browse/HBASE-25625 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > Attachments: image-2021-10-05-17-17-50-944.png > > > Currently CostFunctions including RegionCountSkewCostFunctions, > PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the > unevenness of the distribution by getting the sum of deviation per region > server. This simple implementation works when the cluster is small. But when > the cluster get larger with more region servers and regions, it doesn't work > well with hot spots or a small number of unbalanced servers. The proposal is > to use the standard deviation of the count per region server to capture the > existence of a small portion of region servers with overwhelming > load/allocation. > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations. Say we have 100 regions on 50 nodes, two on each. We > add 50 new nodes and they have 0 each. The max deviation from the mean is 1, > compared to 99 in the worst case scenario of 100 regions on a single server. > The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer > wouldn't move. The proposal is to use the standard deviation of the count > per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 > in this case. > Patch is in test and will follow shortly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate region count distribution
[ https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579306#comment-17579306 ] Clara Xiong edited comment on HBASE-25625 at 8/14/22 3:18 AM: -- But I still recommend lower threshold for larger cluster because neither safeguard covers the case for a small portion of node underloaded or over loaded, say 90%. It happens in our env often at the time we have hot spots that get region splits often. Because the clusters are large, the calculated overall imbalance is diluted. Another solution is to determine the threshold by cluster size automatically which was part of proposal. I'd like to hear your input too from your experience. was (Author: claraxiong): But I still recommend lower threshold for larger cluster because neither safeguard covers the case for a small portion of node underloaded or over loaded, say 90%. It happens in our env often at the time we have hot spots that get region splits often. Because the clusters are large, the calculated overall imbalance is not diluted. Another solution is to determine the threshold by cluster size automatically which was part of proposal. I'd like to hear your input too from your experience. > StochasticBalancer CostFunctions needs a better way to evaluate region count > distribution > - > > Key: HBASE-25625 > URL: https://issues.apache.org/jira/browse/HBASE-25625 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > Attachments: image-2021-10-05-17-17-50-944.png > > > Currently CostFunctions including RegionCountSkewCostFunctions, > PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the > unevenness of the distribution by getting the sum of deviation per region > server. This simple implementation works when the cluster is small. But when > the cluster get larger with more region servers and regions, it doesn't work > well with hot spots or a small number of unbalanced servers. The proposal is > to use the standard deviation of the count per region server to capture the > existence of a small portion of region servers with overwhelming > load/allocation. > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations. Say we have 100 regions on 50 nodes, two on each. We > add 50 new nodes and they have 0 each. The max deviation from the mean is 1, > compared to 99 in the worst case scenario of 100 regions on a single server. > The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer > wouldn't move. The proposal is to use the standard deviation of the count > per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 > in this case. > Patch is in test and will follow shortly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate region count distribution
[ https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579306#comment-17579306 ] Clara Xiong commented on HBASE-25625: - But I still recommend lower threshold for larger cluster because neither safeguard covers the case for a small portion of node underloaded or over loaded, say 90%. It happens in our env often at the time we have hot spots that get region splits often. Because the clusters are large, the calculated overall imbalance is not diluted. Another solution is to determine the threshold by cluster size automatically which was part of proposal. I'd like to hear your input too from your experience. > StochasticBalancer CostFunctions needs a better way to evaluate region count > distribution > - > > Key: HBASE-25625 > URL: https://issues.apache.org/jira/browse/HBASE-25625 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > Attachments: image-2021-10-05-17-17-50-944.png > > > Currently CostFunctions including RegionCountSkewCostFunctions, > PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the > unevenness of the distribution by getting the sum of deviation per region > server. This simple implementation works when the cluster is small. But when > the cluster get larger with more region servers and regions, it doesn't work > well with hot spots or a small number of unbalanced servers. The proposal is > to use the standard deviation of the count per region server to capture the > existence of a small portion of region servers with overwhelming > load/allocation. > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations. Say we have 100 regions on 50 nodes, two on each. We > add 50 new nodes and they have 0 each. The max deviation from the mean is 1, > compared to 99 in the worst case scenario of 100 regions on a single server. > The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer > wouldn't move. The proposal is to use the standard deviation of the count > per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 > in this case. > Patch is in test and will follow shortly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate region count distribution
[ https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579300#comment-17579300 ] Clara Xiong commented on HBASE-25625: - Created https://issues.apache.org/jira/browse/HBASE-27302 Will submit patch shortly. > StochasticBalancer CostFunctions needs a better way to evaluate region count > distribution > - > > Key: HBASE-25625 > URL: https://issues.apache.org/jira/browse/HBASE-25625 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > Attachments: image-2021-10-05-17-17-50-944.png > > > Currently CostFunctions including RegionCountSkewCostFunctions, > PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the > unevenness of the distribution by getting the sum of deviation per region > server. This simple implementation works when the cluster is small. But when > the cluster get larger with more region servers and regions, it doesn't work > well with hot spots or a small number of unbalanced servers. The proposal is > to use the standard deviation of the count per region server to capture the > existence of a small portion of region servers with overwhelming > load/allocation. > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations. Say we have 100 regions on 50 nodes, two on each. We > add 50 new nodes and they have 0 each. The max deviation from the mean is 1, > compared to 99 in the worst case scenario of 100 regions on a single server. > The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer > wouldn't move. The proposal is to use the standard deviation of the count > per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 > in this case. > Patch is in test and will follow shortly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27302) Adding a trigger for Stochastica Balancer to safeguard for upper bound outliers.
Clara Xiong created HBASE-27302: --- Summary: Adding a trigger for Stochastica Balancer to safeguard for upper bound outliers. Key: HBASE-27302 URL: https://issues.apache.org/jira/browse/HBASE-27302 Project: HBase Issue Type: Bug Components: Balancer Reporter: Clara Xiong In large clusters, if one outlier has a lot of regions, the calculated imbalance for RegionCountSkewCostFunction is quite low and often fails to trigger the balancer. For example, a node with twice average count on a 400-node cluster only produce an imbalance of 0.004 < 0.02(current default threshold to trigger balancer). An empty node also have similar effect but we have a safeguard in place. https://issues.apache.org/jira/browse/HBASE-24139 We can add a safeguard for this so we don't have to lower threshold on larger clusters that makes the balancer more sensitive to other minor imbalances. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate region count distribution
[ https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579115#comment-17579115 ] Clara Xiong edited comment on HBASE-25625 at 8/12/22 8:01 PM: -- [~bbeaudreault] The standard deviation solution, as [~dmanning] pointed out and as we simulated, didn't cover all cases better than linear deviation for balancing decisions. This case is more about triggering. We use a lower threshold (0.001) for our larger(500) clusters which works well for us so far. We might want to add a shortcut to trigger rebalancing if any node has >=2 fold(or any reasonable threshold) of average load, just like the shortcut of triggering by empty node, as a safeguard, instead of trying to find the one-fits-all heuristics. What do you think? Or do you have a better proposal? was (Author: claraxiong): [~bbeaudreault] The standard deviation solution, as [~dmanning] pointed out and as we simulated, didn't cover all cases better than linear deviation for balancing decisions. This case is more about triggering. We use a lower threshold (0.001) for our larger(500) clusters. We might want to add a shortcut to trigger rebalancing if any node has >=2 fold(or any reasonable threshold) of average load, just like the shortcut of triggering by empty node, as a safeguard, instead of trying to find the one-fits-all heuristics. What do you think? > StochasticBalancer CostFunctions needs a better way to evaluate region count > distribution > - > > Key: HBASE-25625 > URL: https://issues.apache.org/jira/browse/HBASE-25625 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > Attachments: image-2021-10-05-17-17-50-944.png > > > Currently CostFunctions including RegionCountSkewCostFunctions, > PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the > unevenness of the distribution by getting the sum of deviation per region > server. This simple implementation works when the cluster is small. But when > the cluster get larger with more region servers and regions, it doesn't work > well with hot spots or a small number of unbalanced servers. The proposal is > to use the standard deviation of the count per region server to capture the > existence of a small portion of region servers with overwhelming > load/allocation. > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations. Say we have 100 regions on 50 nodes, two on each. We > add 50 new nodes and they have 0 each. The max deviation from the mean is 1, > compared to 99 in the worst case scenario of 100 regions on a single server. > The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer > wouldn't move. The proposal is to use the standard deviation of the count > per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 > in this case. > Patch is in test and will follow shortly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate region count distribution
[ https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17579115#comment-17579115 ] Clara Xiong commented on HBASE-25625: - [~bbeaudreault] The standard deviation solution, as [~dmanning] pointed out and as we simulated, didn't cover all cases better than linear deviation for balancing decisions. This case is more about triggering. We use a lower threshold (0.001) for our larger(500) clusters. We might want to add a shortcut to trigger rebalancing if any node has >=2 fold(or any reasonable threshold) of average load, just like the shortcut of triggering by empty node, as a safeguard, instead of trying to find the one-fits-all heuristics. What do you think? > StochasticBalancer CostFunctions needs a better way to evaluate region count > distribution > - > > Key: HBASE-25625 > URL: https://issues.apache.org/jira/browse/HBASE-25625 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > Attachments: image-2021-10-05-17-17-50-944.png > > > Currently CostFunctions including RegionCountSkewCostFunctions, > PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the > unevenness of the distribution by getting the sum of deviation per region > server. This simple implementation works when the cluster is small. But when > the cluster get larger with more region servers and regions, it doesn't work > well with hot spots or a small number of unbalanced servers. The proposal is > to use the standard deviation of the count per region server to capture the > existence of a small portion of region servers with overwhelming > load/allocation. > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations. Say we have 100 regions on 50 nodes, two on each. We > add 50 new nodes and they have 0 each. The max deviation from the mean is 1, > compared to 99 in the worst case scenario of 100 regions on a single server. > The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer > wouldn't move. The proposal is to use the standard deviation of the count > per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 > in this case. > Patch is in test and will follow shortly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-27119) [HBCK2] Some commands are broken after HBASE-24587
[ https://issues.apache.org/jira/browse/HBASE-27119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555260#comment-17555260 ] Clara Xiong commented on HBASE-27119: - An alternative fix that follows the same pattern for other commands with command options. https://github.com/apache/hbase-operator-tools/pull/109 > [HBCK2] Some commands are broken after HBASE-24587 > -- > > Key: HBASE-27119 > URL: https://issues.apache.org/jira/browse/HBASE-27119 > Project: HBase > Issue Type: Bug > Components: hbase-operator-tools, hbck2 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > > HBCK2 _replication_ and _filesystem_ commands are broken after HBASE-24587. > Trying to pass the _-f_ or _--fix_ options give the below error: > {noformat} > ERROR: Unrecognized option: -f > FOR USAGE, use the -h or --help option > 2022-06-14T16:07:32,296 INFO [main] client.ConnectionImplementation: Closing > master protocol: MasterService > Exception in thread "main" java.lang.NullPointerException > at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:1083) > at org.apache.hbase.HBCK2.run(HBCK2.java:982) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hbase.HBCK2.main(HBCK2.java:1318) > {noformat} > This is because _getInputList_ calls > [here|https://github.com/apache/hbase-operator-tools/blob/master/hbase-hbck2/src/main/java/org/apache/hbase/HBCK2.java#L1073] > and > [here|https://github.com/apache/hbase-operator-tools/blob/master/hbase-hbck2/src/main/java/org/apache/hbase/HBCK2.java#L1082] > only accept the _-i_/_--inputFiles_, throwing an exception if we pass > _-f/--fix_ options. > Still need to confirm if any other command is affected by this. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-27119) [HBCK2] Some commands are broken after HBASE-24587
[ https://issues.apache.org/jira/browse/HBASE-27119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555227#comment-17555227 ] Clara Xiong commented on HBASE-27119: - Thank you for catching this. I processed the order wrong for these two commands. I checked all other commands such as bypass and assigns, they are handled properly. Will comment in details on the PR. > [HBCK2] Some commands are broken after HBASE-24587 > -- > > Key: HBASE-27119 > URL: https://issues.apache.org/jira/browse/HBASE-27119 > Project: HBase > Issue Type: Bug > Components: hbase-operator-tools, hbck2 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > > HBCK2 _replication_ and _filesystem_ commands are broken after HBASE-24587. > Trying to pass the _-f_ or _--fix_ options give the below error: > {noformat} > ERROR: Unrecognized option: -f > FOR USAGE, use the -h or --help option > 2022-06-14T16:07:32,296 INFO [main] client.ConnectionImplementation: Closing > master protocol: MasterService > Exception in thread "main" java.lang.NullPointerException > at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:1083) > at org.apache.hbase.HBCK2.run(HBCK2.java:982) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hbase.HBCK2.main(HBCK2.java:1318) > {noformat} > This is because _getInputList_ calls > [here|https://github.com/apache/hbase-operator-tools/blob/master/hbase-hbck2/src/main/java/org/apache/hbase/HBCK2.java#L1073] > and > [here|https://github.com/apache/hbase-operator-tools/blob/master/hbase-hbck2/src/main/java/org/apache/hbase/HBCK2.java#L1082] > only accept the _-i_/_--inputFiles_, throwing an exception if we pass > _-f/--fix_ options. > Still need to confirm if any other command is affected by this. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-24587) hbck2 command should accept one or more files containing a list of region names/table names/namespaces
[ https://issues.apache.org/jira/browse/HBASE-24587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17524440#comment-17524440 ] Clara Xiong commented on HBASE-24587: - @[~subrat.mishra] Could you review PR 105 to see if that works for you? Feedbacks are welcome. > hbck2 command should accept one or more files containing a list of region > names/table names/namespaces > -- > > Key: HBASE-24587 > URL: https://issues.apache.org/jira/browse/HBASE-24587 > Project: HBase > Issue Type: Improvement > Components: hbase-operator-tools, hbck2 >Affects Versions: hbase-operator-tools-1.0.0 >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > Currently many command accepts a list of region names/ table names/namespaces > on the command line. We should accept paths to one or more files that > contains these encoded regions, one per line. That way, this command tails > nicely into an operator's incantation using grep/sed over log files. > Similar work has been done in > https://issues.apache.org/jira/browse/HBASE-23927 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (HBASE-24587) hbck2 command should accept one or more files containing a list of region names/table names/namespaces
[ https://issues.apache.org/jira/browse/HBASE-24587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499761#comment-17499761 ] Clara Xiong edited comment on HBASE-24587 at 3/2/22, 11:27 PM: --- # added a common -i option so any command that takes a list of argument can take a list of input files. # updated the -h message and README. # added a unit test for the generic option and test this option for assigns and unassigns. For other commands, since they all share the same code path, I don't see a need to add more ut. The existing ut for other command don't touch generic options either. # for commands that take two arguments(table|resgion state), I opened a subtask for taking the pairs in a list of files because that involves changing the interface of calls to server. But I have roll the changes into the same pr. was (Author: claraxiong): # added a common -i option so any command that takes a list of argument can take a list of input files. # updated the -h message and README. # added a unit test for the generic option and test this option for assigns and unassigns. For other commands, since they all share the same code path, I don't see a need to add more ut. The existing ut for other command don't touch generic options either. # for commands that take two arguments(table|resgion state), I will open a separate jira and pr for taking the pairs in a list of files because that involves changing the interface of calls to server. > hbck2 command should accept one or more files containing a list of region > names/table names/namespaces > -- > > Key: HBASE-24587 > URL: https://issues.apache.org/jira/browse/HBASE-24587 > Project: HBase > Issue Type: Improvement > Components: hbase-operator-tools, hbck2 >Affects Versions: hbase-operator-tools-1.0.0 >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > Currently many command accepts a list of region names/ table names/namespaces > on the command line. We should accept paths to one or more files that > contains these encoded regions, one per line. That way, this command tails > nicely into an operator's incantation using grep/sed over log files. > Similar work has been done in > https://issues.apache.org/jira/browse/HBASE-23927 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (HBASE-26785) hbck setTableState and setRegionState should accept one or more files for batch processing
[ https://issues.apache.org/jira/browse/HBASE-26785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong resolved HBASE-26785. - Resolution: Fixed Rolled into the same PR for parent jira > hbck setTableState and setRegionState should accept one or more files for > batch processing > -- > > Key: HBASE-26785 > URL: https://issues.apache.org/jira/browse/HBASE-26785 > Project: HBase > Issue Type: Sub-task >Reporter: Clara Xiong >Priority: Major > > They should take a list of input files containing list of target and state > pair, one pair per line. > > This aa separate jira and pr from HBASE-24587 because that involves changing > the interface of calls to server. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26785) hbck setTableState and setRegionState should accept one or more files for batch processing
Clara Xiong created HBASE-26785: --- Summary: hbck setTableState and setRegionState should accept one or more files for batch processing Key: HBASE-26785 URL: https://issues.apache.org/jira/browse/HBASE-26785 Project: HBase Issue Type: Sub-task Reporter: Clara Xiong They should take a list of input files containing list of target and state pair, one pair per line. This aa separate jira and pr from HBASE-24587 because that involves changing the interface of calls to server. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-24587) hbck2 command should accept one or more files containing a list of region names/table names/namespaces
[ https://issues.apache.org/jira/browse/HBASE-24587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499761#comment-17499761 ] Clara Xiong commented on HBASE-24587: - # added a common -i option so any command that takes a list of argument can take a list of input files. # updated the -h message and README. # added a unit test for the generic option and test this option for assigns and unassigns. For other commands, since they all share the same code path, I don't see a need to add more ut. The existing ut for other command don't touch generic options either. # for commands that take two arguments(table|resgion state), I will open a separate jira and pr for taking the pairs in a list of files because that involves changing the interface of calls to server. > hbck2 command should accept one or more files containing a list of region > names/table names/namespaces > -- > > Key: HBASE-24587 > URL: https://issues.apache.org/jira/browse/HBASE-24587 > Project: HBase > Issue Type: Improvement > Components: hbase-operator-tools, hbck2 >Affects Versions: hbase-operator-tools-1.0.0 >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > Currently many command accepts a list of region names/ table names/namespaces > on the command line. We should accept paths to one or more files that > contains these encoded regions, one per line. That way, this command tails > nicely into an operator's incantation using grep/sed over log files. > Similar work has been done in > https://issues.apache.org/jira/browse/HBASE-23927 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-24587) hbck2 command should accept one or more files containing a list of region names/table names/namespaces
[ https://issues.apache.org/jira/browse/HBASE-24587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498710#comment-17498710 ] Clara Xiong commented on HBASE-24587: - I am picking up again. will update the pr. inputs are welcome. > hbck2 command should accept one or more files containing a list of region > names/table names/namespaces > -- > > Key: HBASE-24587 > URL: https://issues.apache.org/jira/browse/HBASE-24587 > Project: HBase > Issue Type: Improvement > Components: hbase-operator-tools, hbck2 >Affects Versions: hbase-operator-tools-1.0.0 >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > Currently many command accepts a list of region names/ table names/namespaces > on the command line. We should accept paths to one or more files that > contains these encoded regions, one per line. That way, this command tails > nicely into an operator's incantation using grep/sed over log files. > Similar work has been done in > https://issues.apache.org/jira/browse/HBASE-23927 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26310) Repro balancer behavior during iterations
[ https://issues.apache.org/jira/browse/HBASE-26310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437702#comment-17437702 ] Clara Xiong commented on HBASE-26310: - yes, that was what I meant. Thank you [~zhangduo] > Repro balancer behavior during iterations > - > > Key: HBASE-26310 > URL: https://issues.apache.org/jira/browse/HBASE-26310 > Project: HBase > Issue Type: Sub-task > Components: Balancer, test >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > All existing tests expect balancer to complete in one run. This misses > temporary imbalance or inefficiency during iterations on large clusters. > Recently we found such issues for cohosted region distribution. To repro the > problem and validate fixes, we need a test to simulate multiple iterations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26310) Repro balancer behavior during iterations
[ https://issues.apache.org/jira/browse/HBASE-26310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17437128#comment-17437128 ] Clara Xiong commented on HBASE-26310: - The PR is rolled up in [https://github.com/apache/hbase/pull/3723] > Repro balancer behavior during iterations > - > > Key: HBASE-26310 > URL: https://issues.apache.org/jira/browse/HBASE-26310 > Project: HBase > Issue Type: Sub-task > Components: Balancer, test >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > All existing tests expect balancer to complete in one run. This misses > temporary imbalance or inefficiency during iterations on large clusters. > Recently we found such issues for cohosted region distribution. To repro the > problem and validate fixes, we need a test to simulate multiple iterations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26310) Repro balancer behavior during iterations
[ https://issues.apache.org/jira/browse/HBASE-26310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26310: Parent: HBASE-26309 Issue Type: Sub-task (was: Bug) > Repro balancer behavior during iterations > - > > Key: HBASE-26310 > URL: https://issues.apache.org/jira/browse/HBASE-26310 > Project: HBase > Issue Type: Sub-task > Components: Balancer, test >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > All existing tests expect balancer to complete in one run. This misses > temporary imbalance or inefficiency during iterations on large clusters. > Recently we found such issues for cohosted region distribution. To repro the > problem and validate fixes, we need a test to simulate multiple iterations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26310) Repro balancer behavior during iterations
[ https://issues.apache.org/jira/browse/HBASE-26310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26310: Parent: (was: HBASE-26311) Issue Type: Bug (was: Sub-task) > Repro balancer behavior during iterations > - > > Key: HBASE-26310 > URL: https://issues.apache.org/jira/browse/HBASE-26310 > Project: HBase > Issue Type: Bug > Components: Balancer, test >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > All existing tests expect balancer to complete in one run. This misses > temporary imbalance or inefficiency during iterations on large clusters. > Recently we found such issues for cohosted region distribution. To repro the > problem and validate fixes, we need a test to simulate multiple iterations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-26311) Balancer gets stuck in cohosted replica distribution
[ https://issues.apache.org/jira/browse/HBASE-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428941#comment-17428941 ] Clara Xiong edited comment on HBASE-26311 at 10/14/21, 5:41 PM: Square deviation caused serious regression on some scenario with the absolution max cost too high that brought down the absolute scaled cost too low. It would require significant tuning on weight factors and minCostNeedBalance.. I switched back to original proposal of standard deviation in https://issues.apache.org/jira/browse/HBASE-25625 to normalize the value, which resulted in no need of tuning of weight factor or minCostNeedBalance. Profiling showed similar satisfactory results to unstuck balancer. was (Author: claraxiong): Square deviation caused serious regression on some scenario with the absolution max cost too high that brought down the absolute scaled cost too low. It would require significant tuning on weight factors. I switched back to original proposal of standard deviation in https://issues.apache.org/jira/browse/HBASE-25625 to normalize the value. > Balancer gets stuck in cohosted replica distribution > > > Key: HBASE-26311 > URL: https://issues.apache.org/jira/browse/HBASE-26311 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > In production, we found a corner case where balancer cannot make progress > when there is cohosted replica. This is repro'ed on master branch using test > added in HBASE-26310. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26311) Balancer gets stuck in cohosted replica distribution
[ https://issues.apache.org/jira/browse/HBASE-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428941#comment-17428941 ] Clara Xiong commented on HBASE-26311: - Square deviation caused serious regression on some scenario with the absolution max cost too high that brought down the absolute scaled cost too low. It would require significant tuning on weight factors. I switched back to original proposal of standard deviation in https://issues.apache.org/jira/browse/HBASE-25625 to normalize the value. > Balancer gets stuck in cohosted replica distribution > > > Key: HBASE-26311 > URL: https://issues.apache.org/jira/browse/HBASE-26311 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > In production, we found a corner case where balancer cannot make progress > when there is cohosted replica. This is repro'ed on master branch using test > added in HBASE-26310. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26308) Sum of multiplier of cost functions is not populated properly when we have a shortcut for trigger
[ https://issues.apache.org/jira/browse/HBASE-26308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26308: Parent: HBASE-26311 Issue Type: Sub-task (was: Bug) > Sum of multiplier of cost functions is not populated properly when we have a > shortcut for trigger > - > > Key: HBASE-26308 > URL: https://issues.apache.org/jira/browse/HBASE-26308 > Project: HBase > Issue Type: Sub-task > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Critical > > We have a couple of scenarios that we force balancing: > * idle servers > * co-hosted regions > The code path quit before populating the sum of multiplier of cost functions. > This causes wrong value reported in logging. As below, the weighted average > is not divide by total weight. This causes inconsistent log among iterations. > {quote}2021-09-24 21:46:57,881 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Running > balancer because at least one server hosts replicas of the same region. > 2021-09-24 21:46:57,881 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Start > S*tocha*sticLoadBalancer.balancer, initial weighted average > imbalance=6389.260497305375, functionCost=RegionCountSkewCostFunction : > (multiplier=500.0, imbalance=0.06659036267913739); > PrimaryRegionCountSkewCostFunction : (multiplier=500.0, > imbalance=0.05296760285663541); MoveCostFunction : (multiplier=7.0, > imbalance=0.0, balanced); ServerLocalityCostFunction : (multiplier=25.0, > imbalance=0.46286750487559114); RackLocalityCostFunction : (multiplier=15.0, > imbalance=0.2569525347374165); TableSkewCostFunction : (multiplier=500.0, > imbalance=0.3760689783169534); RegionReplicaHostCostFunction : > (multiplier=10.0, imbalance=0.0553889913899139); > RegionReplicaRackCostFunction : (multiplier=1.0, > imbalance=0.05854089790897909); ReadRequestCostFunction : (multiplier=5.0, > imbalance=0.06969346106898068); WriteRequestCostFunction : (multiplier=5.0, > imbalance=0.07834116112410174); MemStoreSizeCostFunction : (multiplier=5.0, > imbalance=0.12533769793201735); StoreFileCostFunction : (multiplier=5.0, > imbalance=0.06921401085082914); computedMaxSteps=5577401600 > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26310) Repro balancer behavior during iterations
[ https://issues.apache.org/jira/browse/HBASE-26310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26310: Parent: HBASE-26311 Issue Type: Sub-task (was: Bug) > Repro balancer behavior during iterations > - > > Key: HBASE-26310 > URL: https://issues.apache.org/jira/browse/HBASE-26310 > Project: HBase > Issue Type: Sub-task > Components: Balancer, test >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > All existing tests expect balancer to complete in one run. This misses > temporary imbalance or inefficiency during iterations on large clusters. > Recently we found such issues for cohosted region distribution. To repro the > problem and validate fixes, we need a test to simulate multiple iterations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26311) Balancer gets stuck in cohosted replica distribution
[ https://issues.apache.org/jira/browse/HBASE-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26311: Description: In production, we found a corner case where balancer cannot make progress when there is cohosted replica. This is repro'ed on master branch using test added in HBASE-26310. was: In production, we found a corner case where balancer cannot make progress when there is cohosted replica. This is repro'ed on master branch using test added in HBASE-26310. The two cost functions isn't provide proper evaluation so balancer could make progress. > Balancer gets stuck in cohosted replica distribution > > > Key: HBASE-26311 > URL: https://issues.apache.org/jira/browse/HBASE-26311 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > In production, we found a corner case where balancer cannot make progress > when there is cohosted replica. This is repro'ed on master branch using test > added in HBASE-26310. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26311) Balancer gets stuck in cohosted replica distribution
[ https://issues.apache.org/jira/browse/HBASE-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26311: Description: In production, we found a corner case where balancer cannot make progress when there is cohosted replica. This is repro'ed on master branch using test added in HBASE-26310. The two cost functions isn't provide proper evaluation so balancer could make progress. was: In production, we found a corner case where balancer cannot make progress when there is cohosted replica. This is repro'ed on master branch using test added in HBASE-26310. The two cost functions isn't provide proper evaluation so balancer could make progress. Another observation is the imbalance weight is not updated by the cost functions properly during plan generation. The subsequent run reports much high imbalance. {quote}2021-09-24 22:26:56,039 INFO org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Finished computing new moving plan. Computation took 241 ms to try 1284702 different iterations. Found a solution that moves 6941 regions; Going from a computed imbalance of 6389.260497305375 to a new imbalance of 21.03904901349833. 2021-09-24 22:33:40,961 INFO org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Running balancer because at least one server hosts replicas of the same region. 2021-09-24 22:33:40,961 INFO org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Start S*tocha*sticLoadBalancer.balancer, initial weighted average imbalance=6726.357026325619, functionCost=RegionCountSkewCostFunction : (multiplier=500.0, imbalance=0.07721156356401288); PrimaryRegionCountSkewCostFunction : (multiplier=500.0, imbalance=0.06298215530179263); MoveCostFunction : (multiplier=7.0, imbalance=0.0, balanced); ServerLocalityCostFunction : (multiplier=25.0, imbalance=0.463289517245148); RackLocalityCostFunction : (multiplier=15.0, imbalance=0.25670928199727017); TableSkewCostFunction : (multiplier=500.0, imbalance=0.4378048676389543); RegionReplicaHostCostFunction : (multiplier=10.0, imbalance=0.05809798270893372); RegionReplicaRackCostFunction : (multiplier=1.0, imbalance=0.061018251681075886); ReadRequestCostFunction : (multiplier=5.0, imbalance=0.08235908576054465); WriteRequestCostFunction : (multiplier=5.0, imbalance=0.09385090828285425); MemStoreSizeCostFunction : (multiplier=5.0, imbalance=0.1327376982847744); StoreFileCostFunction : (multiplier=5.0, imbalance=0.07986594927573858); computedMaxSteps=5579331200 {quote} > Balancer gets stuck in cohosted replica distribution > > > Key: HBASE-26311 > URL: https://issues.apache.org/jira/browse/HBASE-26311 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > In production, we found a corner case where balancer cannot make progress > when there is cohosted replica. This is repro'ed on master branch using test > added in HBASE-26310. The two cost functions isn't provide proper evaluation > so balancer could make progress. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-26311) Balancer gets stuck in cohosted replica distribution
[ https://issues.apache.org/jira/browse/HBASE-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424743#comment-17424743 ] Clara Xiong edited comment on HBASE-26311 at 10/10/21, 10:26 PM: - When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 5 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, as long as the RS counts are not completely even, which happens all the time, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. see https://issues.apache.org/jira/browse/HBASE-25625 was (Author: claraxiong): When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 5 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, as long as the RS counts are not completely even, which happens all the time, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. we don't need standard deviation so we can keep it simple and fast. see https://issues.apache.org/jira/browse/HBASE-25625 > Balancer gets stuck in cohosted replica distribution > > > Key: HBASE-26311 > URL: https://issues.apache.org/jira/browse/HBASE-26311 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > In production, we found a corner case where balancer cannot make progress > when there is cohosted replica. This is repro'ed on master branch using test > added in HBASE-26310. The two cost functions isn't provide proper evaluation > so balancer could make progress. > > Another observation is the imbalance weight is not updated by the cost > functions properly during plan generation. The subsequent run reports much > high imbalance. > {quote}2021-09-24 22:26:56,039 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Finished > computing new moving plan. Computation took 241 ms to try 1284702 > different iterations. Found a solution that moves 6941 regions; Going from a > computed imbalance of 6389.260497305375 to a new imbalance of > 21.03904901349833. > 2021-09-24 22:33:40,961 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Running > balancer because at least one server hosts replicas of the same region. > 2021-09-24 22:33:40,961 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Start >
[jira] [Comment Edited] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate region count distribution
[ https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423413#comment-17423413 ] Clara Xiong edited comment on HBASE-25625 at 10/10/21, 10:25 PM: - The problem surfaced again on a few different clusters where balancer keeps getting triggered by cohosted replicas. https://issues.apache.org/jira/browse/HBASE-26309 When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 8 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, as long as the RS counts are not completely even, which happens all the time, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. I didn't pursue the work because simulation showed it doesn’t necessarily increase the chance to trigger balancer run for the more uneven cases. But it will balance region counts better when other constraints make perfect balancing of region count impossible. Here is the simulation results: !image-2021-10-05-17-17-50-944.png! was (Author: claraxiong): The problem surfaced again on a few different clusters where balancer keeps getting triggered by cohosted replicas. https://issues.apache.org/jira/browse/HBASE-26309 When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 8 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, as long as the RS counts are not completely even, which happens all the time, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. we don't need standard deviation so we can keep it simple and fast. I didn't pursue the work because simulation showed it doesn’t necessarily increase the chance to trigger balancer run for the more uneven cases. But it will balance region counts better when other constraints make perfect balancing of region count impossible. Here is the simulation results: !image-2021-10-05-17-17-50-944.png! > StochasticBalancer CostFunctions needs a better way to evaluate region count > distribution > - > > Key: HBASE-25625 > URL: https://issues.apache.org/jira/browse/HBASE-25625 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > Attachments: image-2021-10-05-17-17-50-944.png > > > Currently CostFunctions including
[jira] [Updated] (HBASE-26337) Optimization for weighted random generators
[ https://issues.apache.org/jira/browse/HBASE-26337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26337: Description: Currently we use four move candidate generators and pick one randomly for every move with even probability, all with optimization associated with a certain group of cost functions. We can assign weight for the random picking of generators based on balancing pressure on each group. (was: urrently we use four move candidate generators and pick one randomly for every move with even probability, all with optimization associated with a certain group of cost functions. We can assign weight for the random picking of generators based on balancing pressure on each group. ) > Optimization for weighted random generators > --- > > Key: HBASE-26337 > URL: https://issues.apache.org/jira/browse/HBASE-26337 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > Currently we use four move candidate generators and pick one randomly for > every move with even probability, all with optimization associated with a > certain group of cost functions. We can assign weight for the random picking > of generators based on balancing pressure on each group. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26311) Balancer gets stuck in cohosted replica distribution
[ https://issues.apache.org/jira/browse/HBASE-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425839#comment-17425839 ] Clara Xiong commented on HBASE-26311: - Added a writeup for the bigger problem we ran into on a cluster on cloud. > Balancer gets stuck in cohosted replica distribution > > > Key: HBASE-26311 > URL: https://issues.apache.org/jira/browse/HBASE-26311 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > In production, we found a corner case where balancer cannot make progress > when there is cohosted replica. This is repro'ed on master branch using test > added in HBASE-26310. The two cost functions isn't provide proper evaluation > so balancer could make progress. > > Another observation is the imbalance weight is not updated by the cost > functions properly during plan generation. The subsequent run reports much > high imbalance. > {quote}2021-09-24 22:26:56,039 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Finished > computing new moving plan. Computation took 241 ms to try 1284702 > different iterations. Found a solution that moves 6941 regions; Going from a > computed imbalance of 6389.260497305375 to a new imbalance of > 21.03904901349833. > 2021-09-24 22:33:40,961 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Running > balancer because at least one server hosts replicas of the same region. > 2021-09-24 22:33:40,961 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Start > S*tocha*sticLoadBalancer.balancer, initial weighted average > imbalance=6726.357026325619, functionCost=RegionCountSkewCostFunction : > (multiplier=500.0, imbalance=0.07721156356401288); > PrimaryRegionCountSkewCostFunction : (multiplier=500.0, > imbalance=0.06298215530179263); MoveCostFunction : (multiplier=7.0, > imbalance=0.0, balanced); ServerLocalityCostFunction : (multiplier=25.0, > imbalance=0.463289517245148); RackLocalityCostFunction : (multiplier=15.0, > imbalance=0.25670928199727017); TableSkewCostFunction : (multiplier=500.0, > imbalance=0.4378048676389543); RegionReplicaHostCostFunction : > (multiplier=10.0, imbalance=0.05809798270893372); > RegionReplicaRackCostFunction : (multiplier=1.0, > imbalance=0.061018251681075886); ReadRequestCostFunction : (multiplier=5.0, > imbalance=0.08235908576054465); WriteRequestCostFunction : (multiplier=5.0, > imbalance=0.09385090828285425); MemStoreSizeCostFunction : (multiplier=5.0, > imbalance=0.1327376982847744); StoreFileCostFunction : (multiplier=5.0, > imbalance=0.07986594927573858); computedMaxSteps=5579331200 > {quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26337) Optimization for weighted random generators
[ https://issues.apache.org/jira/browse/HBASE-26337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425824#comment-17425824 ] Clara Xiong commented on HBASE-26337: - 4 move candidate generators: * LoadCandidateGenerator (optimized for RegionCountSkewCostFunction) * LocalityCandidateGenerator (optimized for ServerLocalityCostFunction and RackLocalityCostFunction) * RegionReplicaRackCandidateGenerator (optimized for RegionReplicaHostCostFunction and RegionReplicaRackCostFunction) * RandomCandidateGenerator (for all cost functions) > Optimization for weighted random generators > --- > > Key: HBASE-26337 > URL: https://issues.apache.org/jira/browse/HBASE-26337 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > urrently we use four move candidate generators and pick one randomly for > every move with even probability, all with optimization associated with a > certain group of cost functions. We can assign weight for the random picking > of generators based on balancing pressure on each group. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26337) Optimization for weighted random generators
Clara Xiong created HBASE-26337: --- Summary: Optimization for weighted random generators Key: HBASE-26337 URL: https://issues.apache.org/jira/browse/HBASE-26337 Project: HBase Issue Type: Improvement Components: Balancer Reporter: Clara Xiong Assignee: Clara Xiong urrently we use four move candidate generators and pick one randomly for every move with even probability, all with optimization associated with a certain group of cost functions. We can assign weight for the random picking of generators based on balancing pressure on each group. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (HBASE-26309) Balancer tends to move regions to the server at the end of list
[ https://issues.apache.org/jira/browse/HBASE-26309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26309: Comment: was deleted (was: When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 5 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, as long as the RS counts are not completely even, which happens all the time, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. we don't need standard deviation so we can keep it simple and fast. see https://issues.apache.org/jira/browse/HBASE-25625) > Balancer tends to move regions to the server at the end of list > --- > > Key: HBASE-26309 > URL: https://issues.apache.org/jira/browse/HBASE-26309 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > When we have cohosted replicas, cohosted replica distribution has priority to > region count skew. So we won't see region count getting balanced until the > replicas distributed. During this time, master UI shows a drift of regions to > the server at the end of list and causes significant overload. This shows a > bias of random region pick and needs to be addressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26311) Balancer gets stuck in cohosted replica distribution
[ https://issues.apache.org/jira/browse/HBASE-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424743#comment-17424743 ] Clara Xiong commented on HBASE-26311: - When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 5 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, as long as the RS counts are not completely even, which happens all the time, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. we don't need standard deviation so we can keep it simple and fast. see https://issues.apache.org/jira/browse/HBASE-25625 > Balancer gets stuck in cohosted replica distribution > > > Key: HBASE-26311 > URL: https://issues.apache.org/jira/browse/HBASE-26311 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > In production, we found a corner case where balancer cannot make progress > when there is cohosted replica. This is repro'ed on master branch using test > added in HBASE-26310. The two cost functions isn't provide proper evaluation > so balancer could make progress. > > Another observation is the imbalance weight is not updated by the cost > functions properly during plan generation. The subsequent run reports much > high imbalance. > {quote}2021-09-24 22:26:56,039 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Finished > computing new moving plan. Computation took 241 ms to try 1284702 > different iterations. Found a solution that moves 6941 regions; Going from a > computed imbalance of 6389.260497305375 to a new imbalance of > 21.03904901349833. > 2021-09-24 22:33:40,961 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Running > balancer because at least one server hosts replicas of the same region. > 2021-09-24 22:33:40,961 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Start > S*tocha*sticLoadBalancer.balancer, initial weighted average > imbalance=6726.357026325619, functionCost=RegionCountSkewCostFunction : > (multiplier=500.0, imbalance=0.07721156356401288); > PrimaryRegionCountSkewCostFunction : (multiplier=500.0, > imbalance=0.06298215530179263); MoveCostFunction : (multiplier=7.0, > imbalance=0.0, balanced); ServerLocalityCostFunction : (multiplier=25.0, > imbalance=0.463289517245148); RackLocalityCostFunction : (multiplier=15.0, > imbalance=0.25670928199727017); TableSkewCostFunction : (multiplier=500.0, > imbalance=0.4378048676389543); RegionReplicaHostCostFunction : > (multiplier=10.0, imbalance=0.05809798270893372); > RegionReplicaRackCostFunction : (multiplier=1.0, > imbalance=0.061018251681075886); ReadRequestCostFunction : (multiplier=5.0, > imbalance=0.08235908576054465); WriteRequestCostFunction : (multiplier=5.0, > imbalance=0.09385090828285425); MemStoreSizeCostFunction : (multiplier=5.0, > imbalance=0.1327376982847744); StoreFileCostFunction : (multiplier=5.0, > imbalance=0.07986594927573858); computedMaxSteps=5579331200 > {quote} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate region count distribution
[ https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-25625: Summary: StochasticBalancer CostFunctions needs a better way to evaluate region count distribution (was: StochasticBalancer CostFunctions needs a better way to evaluate resource distribution) > StochasticBalancer CostFunctions needs a better way to evaluate region count > distribution > - > > Key: HBASE-25625 > URL: https://issues.apache.org/jira/browse/HBASE-25625 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > Attachments: image-2021-10-05-17-17-50-944.png > > > Currently CostFunctions including RegionCountSkewCostFunctions, > PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the > unevenness of the distribution by getting the sum of deviation per region > server. This simple implementation works when the cluster is small. But when > the cluster get larger with more region servers and regions, it doesn't work > well with hot spots or a small number of unbalanced servers. The proposal is > to use the standard deviation of the count per region server to capture the > existence of a small portion of region servers with overwhelming > load/allocation. > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations. Say we have 100 regions on 50 nodes, two on each. We > add 50 new nodes and they have 0 each. The max deviation from the mean is 1, > compared to 99 in the worst case scenario of 100 regions on a single server. > The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer > wouldn't move. The proposal is to use the standard deviation of the count > per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 > in this case. > Patch is in test and will follow shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate resource distribution
[ https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423413#comment-17423413 ] Clara Xiong edited comment on HBASE-25625 at 10/6/21, 12:17 AM: The problem surfaced again on a few different clusters where balancer keeps getting triggered by cohosted replicas. https://issues.apache.org/jira/browse/HBASE-26309 When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 8 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, as long as the RS counts are not completely even, which happens all the time, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. we don't need standard deviation so we can keep it simple and fast. I didn't pursue the work because simulation showed it doesn’t necessarily increase the chance to trigger balancer run for the more uneven cases. But it will balance region counts better when other constraints make perfect balancing of region count impossible. Here is the simulation results: !image-2021-10-05-17-17-50-944.png! was (Author: claraxiong): The problem surfaced again on a few different clusters where balancer keeps getting triggered by cohosted replicas. https://issues.apache.org/jira/browse/HBASE-26309 When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 8 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, as long as the RS counts are not completely even, which happens all the time, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. we don't need standard deviation so we can keep it simple and fast. > StochasticBalancer CostFunctions needs a better way to evaluate resource > distribution > - > > Key: HBASE-25625 > URL: https://issues.apache.org/jira/browse/HBASE-25625 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > Attachments: image-2021-10-05-17-17-50-944.png > > > Currently CostFunctions including RegionCountSkewCostFunctions, > PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the > unevenness of the distribution by getting the sum of deviation per region > server. This simple implementation works when the cluster is small. But when > the cluster get larger with more
[jira] [Comment Edited] (HBASE-26309) Balancer tends to move regions to the server at the end of list
[ https://issues.apache.org/jira/browse/HBASE-26309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423416#comment-17423416 ] Clara Xiong edited comment on HBASE-26309 at 10/6/21, 12:15 AM: When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 5 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, as long as the RS counts are not completely even, which happens all the time, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. we don't need standard deviation so we can keep it simple and fast. see https://issues.apache.org/jira/browse/HBASE-25625 was (Author: claraxiong): When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 5 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, as long as the RS counts are not completely even, which happens all the time, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. we don't need standard deviation so we can keep it simple and fast. > Balancer tends to move regions to the server at the end of list > --- > > Key: HBASE-26309 > URL: https://issues.apache.org/jira/browse/HBASE-26309 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > When we have cohosted replicas, cohosted replica distribution has priority to > region count skew. So we won't see region count getting balanced until the > replicas distributed. During this time, master UI shows a drift of regions to > the server at the end of list and causes significant overload. This shows a > bias of random region pick and needs to be addressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26309) Balancer tends to move regions to the server at the end of list
[ https://issues.apache.org/jira/browse/HBASE-26309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424742#comment-17424742 ] Clara Xiong commented on HBASE-26309: - There is also a reason for the bias of the server pick. LoadCandidate generator always pick the first server of the same region count. For large cluster, it often gets stuck and also causes drifting. > Balancer tends to move regions to the server at the end of list > --- > > Key: HBASE-26309 > URL: https://issues.apache.org/jira/browse/HBASE-26309 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > When we have cohosted replicas, cohosted replica distribution has priority to > region count skew. So we won't see region count getting balanced until the > replicas distributed. During this time, master UI shows a drift of regions to > the server at the end of list and causes significant overload. This shows a > bias of random region pick and needs to be addressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-26327) Replicas cohosted on a rack shouldn't keep triggering Balancer
[ https://issues.apache.org/jira/browse/HBASE-26327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong reassigned HBASE-26327: --- Assignee: Clara Xiong > Replicas cohosted on a rack shouldn't keep triggering Balancer > -- > > Key: HBASE-26327 > URL: https://issues.apache.org/jira/browse/HBASE-26327 > Project: HBase > Issue Type: Sub-task > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > Currently, Balancer has a shortcut check for cohosted replicas of the same > region/host/rack and will keep triggering balancer if it is non-zero. > With the trending of kube and cloud solution for HBase, operators don't have > full control of the topology or are not even aware of the topology. There are > cases that it is not possible to satisfy or requires sacrificing other > constraints such as region count balancing on RS. We want to keep there per > RS/host check for availability of regions, especially for meta region. We > haven't heard problem with rack so far. The cost functions will still be > considered during balancing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26327) Replicas cohosted on a rack shouldn't keep triggering Balancer
Clara Xiong created HBASE-26327: --- Summary: Replicas cohosted on a rack shouldn't keep triggering Balancer Key: HBASE-26327 URL: https://issues.apache.org/jira/browse/HBASE-26327 Project: HBase Issue Type: Sub-task Components: Balancer Reporter: Clara Xiong Currently, Balancer has a shortcut check for cohosted replicas of the same region/host/rack and will keep triggering balancer if it is non-zero. With the trending of kube and cloud solution for HBase, operators don't have full control of the topology or are not even aware of the topology. There are cases that it is not possible to satisfy or requires sacrificing other constraints such as region count balancing on RS. We want to keep there per RS/host check for availability of regions, especially for meta region. We haven't heard problem with rack so far. The cost functions will still be considered during balancing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-26309) Balancer tends to move regions to the server at the end of list
[ https://issues.apache.org/jira/browse/HBASE-26309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423416#comment-17423416 ] Clara Xiong edited comment on HBASE-26309 at 10/1/21, 9:57 PM: --- When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 5 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, as long as the RS counts are not completely even, which happens all the time, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. we don't need standard deviation so we can keep it simple and fast. was (Author: claraxiong): When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 8 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, as long as the RS counts are not completely even, which happens all the time, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. we don't need standard deviation so we can keep it simple and fast. > Balancer tends to move regions to the server at the end of list > --- > > Key: HBASE-26309 > URL: https://issues.apache.org/jira/browse/HBASE-26309 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > When we have cohosted replicas, cohosted replica distribution has priority to > region count skew. So we won't see region count getting balanced until the > replicas distributed. During this time, master UI shows a drift of regions to > the server at the end of list and causes significant overload. This shows a > bias of random region pick and needs to be addressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26309) Balancer tends to move regions to the server at the end of list
[ https://issues.apache.org/jira/browse/HBASE-26309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423416#comment-17423416 ] Clara Xiong commented on HBASE-26309: - When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 8 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, as long as the RS counts are not completely even, which happens all the time, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. we don't need standard deviation so we can keep it simple and fast. > Balancer tends to move regions to the server at the end of list > --- > > Key: HBASE-26309 > URL: https://issues.apache.org/jira/browse/HBASE-26309 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > When we have cohosted replicas, cohosted replica distribution has priority to > region count skew. So we won't see region count getting balanced until the > replicas distributed. During this time, master UI shows a drift of regions to > the server at the end of list and causes significant overload. This shows a > bias of random region pick and needs to be addressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26309) Balancer tends to move regions to the server at the end of list
[ https://issues.apache.org/jira/browse/HBASE-26309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26309: Description: When we have cohosted replicas, cohosted replica distribution has priority to region count skew. So we won't see region count getting balanced until the replicas distributed. During this time, master UI shows a drift of regions to the server at the end of list and causes significant overload. This shows a bias of random region pick and needs to be addressed. (was: When we have a cohosted regions, cohosted replica distribution has priority to region count skew. So we won't see region count getting balanced until the replicas distributed. During this time, master UI shows a drift of regions to the server at the end of list and causes significant overload. This shows a bias of random region pick and needs to be addressed.) > Balancer tends to move regions to the server at the end of list > --- > > Key: HBASE-26309 > URL: https://issues.apache.org/jira/browse/HBASE-26309 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > When we have cohosted replicas, cohosted replica distribution has priority to > region count skew. So we won't see region count getting balanced until the > replicas distributed. During this time, master UI shows a drift of regions to > the server at the end of list and causes significant overload. This shows a > bias of random region pick and needs to be addressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate resource distribution
[ https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423413#comment-17423413 ] Clara Xiong edited comment on HBASE-25625 at 10/1/21, 8:14 PM: --- The problem surfaced again on a few different clusters where balancer keeps getting triggered by cohosted replicas. https://issues.apache.org/jira/browse/HBASE-26309 When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 8 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, as long as the RS counts are not completely even, which happens all the time, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. we don't need standard deviation so we can keep it simple and fast. was (Author: claraxiong): The problem surfaced again on a few different clusters where balancer keeps getting triggered by cohosted replicas. https://issues.apache.org/jira/browse/HBASE-26309 When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 8 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. we don't need standard deviation so we can keep it simple and fast. > StochasticBalancer CostFunctions needs a better way to evaluate resource > distribution > - > > Key: HBASE-25625 > URL: https://issues.apache.org/jira/browse/HBASE-25625 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > Currently CostFunctions including RegionCountSkewCostFunctions, > PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the > unevenness of the distribution by getting the sum of deviation per region > server. This simple implementation works when the cluster is small. But when > the cluster get larger with more region servers and regions, it doesn't work > well with hot spots or a small number of unbalanced servers. The proposal is > to use the standard deviation of the count per region server to capture the > existence of a small portion of region servers with overwhelming > load/allocation. > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations.
[jira] [Commented] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate resource distribution
[ https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423413#comment-17423413 ] Clara Xiong commented on HBASE-25625: - The problem surfaced again on a few different clusters where balancer keeps getting triggered by cohosted replicas. https://issues.apache.org/jira/browse/HBASE-26309 When balancer has to satisfy other constraints, even region count distribution just cannot be guaranteed, as in existing test case TestStochasticLoadBalancerRegionReplicaWithRacks. Because replica distribution has much higher weight than region count skew, the rack with fewer servers tend to get more regions than those with more servers. In this test case, server 0 and 1 are on the same rack while 2 and 3 are on each's rack because servers cannot be place completely evenly. The resulted region count distribution can be [2,2, 4, 4] or be[1, 3, 4, 4]so that we have no replicas of the same region on the first rack. So we have to have fewer regions per server on the first two servers. With the current algorithm, the costs of two plan are the same for region count skew because only linear deviation to ideal average is considered. It can get much more extreme when we have 8 servers for this test case: [1,3,3,3,5]or [2,2,3,3,5] depending on the random walk. But since the algorithm says they are the same expense for region count skew, balancer can be stuck at the former. The more servers we have, the more variation of results we will see depending the random walk. But once we reach the extreme case, balancer is stuck because the cost function says moving doesn't gain. I am proposing using the sum of square of deviation for load functions, inline with replica cost functions. we don't need standard deviation so we can keep it simple and fast. > StochasticBalancer CostFunctions needs a better way to evaluate resource > distribution > - > > Key: HBASE-25625 > URL: https://issues.apache.org/jira/browse/HBASE-25625 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > Currently CostFunctions including RegionCountSkewCostFunctions, > PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the > unevenness of the distribution by getting the sum of deviation per region > server. This simple implementation works when the cluster is small. But when > the cluster get larger with more region servers and regions, it doesn't work > well with hot spots or a small number of unbalanced servers. The proposal is > to use the standard deviation of the count per region server to capture the > existence of a small portion of region servers with overwhelming > load/allocation. > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations. Say we have 100 regions on 50 nodes, two on each. We > add 50 new nodes and they have 0 each. The max deviation from the mean is 1, > compared to 99 in the worst case scenario of 100 regions on a single server. > The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer > wouldn't move. The proposal is to use the standard deviation of the count > per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 > in this case. > Patch is in test and will follow shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26308) Sum of multiplier of cost functions is not populated properly when we have a shortcut for trigger
[ https://issues.apache.org/jira/browse/HBASE-26308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422549#comment-17422549 ] Clara Xiong commented on HBASE-26308: - https://github.com/apache/hbase/pull/3710 > Sum of multiplier of cost functions is not populated properly when we have a > shortcut for trigger > - > > Key: HBASE-26308 > URL: https://issues.apache.org/jira/browse/HBASE-26308 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Critical > > We have a couple of scenarios that we force balancing: > * idle servers > * co-hosted regions > The code path quit before populating the sum of multiplier of cost functions. > This causes wrong value reported in logging. As below, the weighted average > is not divide by total weight. This causes inconsistent log among iterations. > {quote}2021-09-24 21:46:57,881 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Running > balancer because at least one server hosts replicas of the same region. > 2021-09-24 21:46:57,881 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Start > S*tocha*sticLoadBalancer.balancer, initial weighted average > imbalance=6389.260497305375, functionCost=RegionCountSkewCostFunction : > (multiplier=500.0, imbalance=0.06659036267913739); > PrimaryRegionCountSkewCostFunction : (multiplier=500.0, > imbalance=0.05296760285663541); MoveCostFunction : (multiplier=7.0, > imbalance=0.0, balanced); ServerLocalityCostFunction : (multiplier=25.0, > imbalance=0.46286750487559114); RackLocalityCostFunction : (multiplier=15.0, > imbalance=0.2569525347374165); TableSkewCostFunction : (multiplier=500.0, > imbalance=0.3760689783169534); RegionReplicaHostCostFunction : > (multiplier=10.0, imbalance=0.0553889913899139); > RegionReplicaRackCostFunction : (multiplier=1.0, > imbalance=0.05854089790897909); ReadRequestCostFunction : (multiplier=5.0, > imbalance=0.06969346106898068); WriteRequestCostFunction : (multiplier=5.0, > imbalance=0.07834116112410174); MemStoreSizeCostFunction : (multiplier=5.0, > imbalance=0.12533769793201735); StoreFileCostFunction : (multiplier=5.0, > imbalance=0.06921401085082914); computedMaxSteps=5577401600 > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26311) Balancer gets stuck in cohosted replica distribution
[ https://issues.apache.org/jira/browse/HBASE-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26311: Description: In production, we found a corner case where balancer cannot make progress when there is cohosted replica. This is repro'ed on master branch using test added in HBASE-26310. The two cost functions isn't provide proper evaluation so balancer could make progress. Another observation is the imbalance weight is not updated by the cost functions properly during plan generation. The subsequent run reports much high imbalance. {quote}2021-09-24 22:26:56,039 INFO org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Finished computing new moving plan. Computation took 241 ms to try 1284702 different iterations. Found a solution that moves 6941 regions; Going from a computed imbalance of 6389.260497305375 to a new imbalance of 21.03904901349833. 2021-09-24 22:33:40,961 INFO org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Running balancer because at least one server hosts replicas of the same region. 2021-09-24 22:33:40,961 INFO org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Start S*tocha*sticLoadBalancer.balancer, initial weighted average imbalance=6726.357026325619, functionCost=RegionCountSkewCostFunction : (multiplier=500.0, imbalance=0.07721156356401288); PrimaryRegionCountSkewCostFunction : (multiplier=500.0, imbalance=0.06298215530179263); MoveCostFunction : (multiplier=7.0, imbalance=0.0, balanced); ServerLocalityCostFunction : (multiplier=25.0, imbalance=0.463289517245148); RackLocalityCostFunction : (multiplier=15.0, imbalance=0.25670928199727017); TableSkewCostFunction : (multiplier=500.0, imbalance=0.4378048676389543); RegionReplicaHostCostFunction : (multiplier=10.0, imbalance=0.05809798270893372); RegionReplicaRackCostFunction : (multiplier=1.0, imbalance=0.061018251681075886); ReadRequestCostFunction : (multiplier=5.0, imbalance=0.08235908576054465); WriteRequestCostFunction : (multiplier=5.0, imbalance=0.09385090828285425); MemStoreSizeCostFunction : (multiplier=5.0, imbalance=0.1327376982847744); StoreFileCostFunction : (multiplier=5.0, imbalance=0.07986594927573858); computedMaxSteps=5579331200 {quote} was:In production, we found a corner case where balancer cannot make progress when there is cohosted replica. This is reproed on master branch using test added in HBASE-26310. The cost function isn't provide proper evaluation so balancer could make progress. > Balancer gets stuck in cohosted replica distribution > > > Key: HBASE-26311 > URL: https://issues.apache.org/jira/browse/HBASE-26311 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > In production, we found a corner case where balancer cannot make progress > when there is cohosted replica. This is repro'ed on master branch using test > added in HBASE-26310. The two cost functions isn't provide proper evaluation > so balancer could make progress. > > Another observation is the imbalance weight is not updated by the cost > functions properly during plan generation. The subsequent run reports much > high imbalance. > {quote}2021-09-24 22:26:56,039 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Finished > computing new moving plan. Computation took 241 ms to try 1284702 > different iterations. Found a solution that moves 6941 regions; Going from a > computed imbalance of 6389.260497305375 to a new imbalance of > 21.03904901349833. > 2021-09-24 22:33:40,961 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Running > balancer because at least one server hosts replicas of the same region. > 2021-09-24 22:33:40,961 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Start > S*tocha*sticLoadBalancer.balancer, initial weighted average > imbalance=6726.357026325619, functionCost=RegionCountSkewCostFunction : > (multiplier=500.0, imbalance=0.07721156356401288); > PrimaryRegionCountSkewCostFunction : (multiplier=500.0, > imbalance=0.06298215530179263); MoveCostFunction : (multiplier=7.0, > imbalance=0.0, balanced); ServerLocalityCostFunction : (multiplier=25.0, > imbalance=0.463289517245148); RackLocalityCostFunction : (multiplier=15.0, > imbalance=0.25670928199727017); TableSkewCostFunction : (multiplier=500.0, > imbalance=0.4378048676389543); RegionReplicaHostCostFunction : > (multiplier=10.0, imbalance=0.05809798270893372); > RegionReplicaRackCostFunction : (multiplier=1.0, > imbalance=0.061018251681075886); ReadRequestCostFunction : (multiplier=5.0, > imbalance=0.08235908576054465);
[jira] [Assigned] (HBASE-26311) Balancer gets stuck in cohosted replica distribution
[ https://issues.apache.org/jira/browse/HBASE-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong reassigned HBASE-26311: --- Assignee: Clara Xiong > Balancer gets stuck in cohosted replica distribution > > > Key: HBASE-26311 > URL: https://issues.apache.org/jira/browse/HBASE-26311 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > In production, we found a corner case where balancer cannot make progress > when there is cohosted replica. This is reproed on master branch using test > added in HBASE-26310. The cost function isn't provide proper evaluation so > balancer could make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-26310) Repro balancer behavior during iterations
[ https://issues.apache.org/jira/browse/HBASE-26310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong reassigned HBASE-26310: --- Assignee: Clara Xiong > Repro balancer behavior during iterations > - > > Key: HBASE-26310 > URL: https://issues.apache.org/jira/browse/HBASE-26310 > Project: HBase > Issue Type: Bug > Components: Balancer, test >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > All existing tests expect balancer to complete in one run. This misses > temporary imbalance or inefficiency during iterations on large clusters. > Recently we found such issues for cohosted region distribution. To repro the > problem and validate fixes, we need a test to simulate multiple iterations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26311) Balancer gets stuck in cohosted replica distribution
Clara Xiong created HBASE-26311: --- Summary: Balancer gets stuck in cohosted replica distribution Key: HBASE-26311 URL: https://issues.apache.org/jira/browse/HBASE-26311 Project: HBase Issue Type: Bug Components: Balancer Reporter: Clara Xiong In production, we found a corner case where balancer cannot make progress when there is cohosted replica. This is reproed on master branch using test added in HBASE-26310. The cost function isn't provide proper evaluation so balancer could make progress. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26309) Balancer tends to move regions to the server at the end of list
[ https://issues.apache.org/jira/browse/HBASE-26309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26309: Issue Type: Improvement (was: Bug) > Balancer tends to move regions to the server at the end of list > --- > > Key: HBASE-26309 > URL: https://issues.apache.org/jira/browse/HBASE-26309 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > When we have a cohosted regions, cohosted replica distribution has priority > to region count skew. So we won't see region count getting balanced until > the replicas distributed. During this time, master UI shows a drift of > regions to the server at the end of list and causes significant overload. > This shows a bias of random region pick and needs to be addressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26310) Repro balancer behavior during iterations
[ https://issues.apache.org/jira/browse/HBASE-26310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26310: Parent: (was: HBASE-26309) Issue Type: Bug (was: Sub-task) > Repro balancer behavior during iterations > - > > Key: HBASE-26310 > URL: https://issues.apache.org/jira/browse/HBASE-26310 > Project: HBase > Issue Type: Bug >Reporter: Clara Xiong >Priority: Major > > All existing tests expect balancer to complete in one run. This misses > temporary imbalance or inefficiency during iterations on large clusters. > Recently we found such issues for cohosted region distribution. To repro the > problem and validate fixes, we need a test to simulate multiple iterations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26310) Repro balancer behavior during iterations
[ https://issues.apache.org/jira/browse/HBASE-26310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26310: Component/s: test Balancer > Repro balancer behavior during iterations > - > > Key: HBASE-26310 > URL: https://issues.apache.org/jira/browse/HBASE-26310 > Project: HBase > Issue Type: Bug > Components: Balancer, test >Reporter: Clara Xiong >Priority: Major > > All existing tests expect balancer to complete in one run. This misses > temporary imbalance or inefficiency during iterations on large clusters. > Recently we found such issues for cohosted region distribution. To repro the > problem and validate fixes, we need a test to simulate multiple iterations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26309) Balancer tends to move regions to the server at the end of list
[ https://issues.apache.org/jira/browse/HBASE-26309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26309: Summary: Balancer tends to move regions to the server at the end of list (was: Balancer tend to move regions to the server at the end of list) > Balancer tends to move regions to the server at the end of list > --- > > Key: HBASE-26309 > URL: https://issues.apache.org/jira/browse/HBASE-26309 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > When we have a cohosted regions, cohosted replica distribution has priority > to region count skew. So we won't see region count getting balanced until > the replicas distributed. During this time, master UI shows a drift of > regions to the server at the end of list and causes significant overload. > This shows a bias of random region pick and needs to be addressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26310) Repro balancer behavior during iterations
Clara Xiong created HBASE-26310: --- Summary: Repro balancer behavior during iterations Key: HBASE-26310 URL: https://issues.apache.org/jira/browse/HBASE-26310 Project: HBase Issue Type: Sub-task Reporter: Clara Xiong All existing tests expect balancer to complete in one run. This misses temporary imbalance or inefficiency during iterations on large clusters. Recently we found such issues for cohosted region distribution. To repro the problem and validate fixes, we need a test to simulate multiple iterations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-26309) Balancer tend to move regions to the server at the end of list
[ https://issues.apache.org/jira/browse/HBASE-26309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong reassigned HBASE-26309: --- Assignee: Clara Xiong > Balancer tend to move regions to the server at the end of list > -- > > Key: HBASE-26309 > URL: https://issues.apache.org/jira/browse/HBASE-26309 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > When we have a cohosted regions, cohosted replica distribution has priority > to region count skew. So we won't see region count getting balanced until > the replicas distributed. During this time, master UI shows a drift of > regions to the server at the end of list and causes significant overload. > This shows a bias of random region pick and needs to be addressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26309) Balancer tend to move regions to the server at the end of list
Clara Xiong created HBASE-26309: --- Summary: Balancer tend to move regions to the server at the end of list Key: HBASE-26309 URL: https://issues.apache.org/jira/browse/HBASE-26309 Project: HBase Issue Type: Bug Components: Balancer Reporter: Clara Xiong When we have a cohosted regions, cohosted replica distribution has priority to region count skew. So we won't see region count getting balanced until the replicas distributed. During this time, master UI shows a drift of regions to the server at the end of list and causes significant overload. This shows a bias of random region pick and needs to be addressed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-26308) Sum of multiplier of cost functions is not populated properly when we have a shortcut for trigger
[ https://issues.apache.org/jira/browse/HBASE-26308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong reassigned HBASE-26308: --- Assignee: Clara Xiong > Sum of multiplier of cost functions is not populated properly when we have a > shortcut for trigger > - > > Key: HBASE-26308 > URL: https://issues.apache.org/jira/browse/HBASE-26308 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Critical > > We have a couple of scenarios that we force balancing: > * idle servers > * co-hosted regions > The code path quit before populating the sum of multiplier of cost functions. > This causes wrong value reported in logging. As below, the weighted average > is not divide by total weight. This causes inconsistent log among iterations. > {quote}2021-09-24 21:46:57,881 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Running > balancer because at least one server hosts replicas of the same region. > 2021-09-24 21:46:57,881 INFO > org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Start > S*tocha*sticLoadBalancer.balancer, initial weighted average > imbalance=6389.260497305375, functionCost=RegionCountSkewCostFunction : > (multiplier=500.0, imbalance=0.06659036267913739); > PrimaryRegionCountSkewCostFunction : (multiplier=500.0, > imbalance=0.05296760285663541); MoveCostFunction : (multiplier=7.0, > imbalance=0.0, balanced); ServerLocalityCostFunction : (multiplier=25.0, > imbalance=0.46286750487559114); RackLocalityCostFunction : (multiplier=15.0, > imbalance=0.2569525347374165); TableSkewCostFunction : (multiplier=500.0, > imbalance=0.3760689783169534); RegionReplicaHostCostFunction : > (multiplier=10.0, imbalance=0.0553889913899139); > RegionReplicaRackCostFunction : (multiplier=1.0, > imbalance=0.05854089790897909); ReadRequestCostFunction : (multiplier=5.0, > imbalance=0.06969346106898068); WriteRequestCostFunction : (multiplier=5.0, > imbalance=0.07834116112410174); MemStoreSizeCostFunction : (multiplier=5.0, > imbalance=0.12533769793201735); StoreFileCostFunction : (multiplier=5.0, > imbalance=0.06921401085082914); computedMaxSteps=5577401600 > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26308) Sum of multiplier of cost functions is not populated properly when we have a shortcut for trigger
Clara Xiong created HBASE-26308: --- Summary: Sum of multiplier of cost functions is not populated properly when we have a shortcut for trigger Key: HBASE-26308 URL: https://issues.apache.org/jira/browse/HBASE-26308 Project: HBase Issue Type: Bug Components: Balancer Reporter: Clara Xiong We have a couple of scenarios that we force balancing: * idle servers * co-hosted regions The code path quit before populating the sum of multiplier of cost functions. This causes wrong value reported in logging. As below, the weighted average is not divide by total weight. This causes inconsistent log among iterations. {quote}2021-09-24 21:46:57,881 INFO org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Running balancer because at least one server hosts replicas of the same region. 2021-09-24 21:46:57,881 INFO org.apache.hadoop.hbase.master.balancer.S*tocha*sticLoadBalancer: Start S*tocha*sticLoadBalancer.balancer, initial weighted average imbalance=6389.260497305375, functionCost=RegionCountSkewCostFunction : (multiplier=500.0, imbalance=0.06659036267913739); PrimaryRegionCountSkewCostFunction : (multiplier=500.0, imbalance=0.05296760285663541); MoveCostFunction : (multiplier=7.0, imbalance=0.0, balanced); ServerLocalityCostFunction : (multiplier=25.0, imbalance=0.46286750487559114); RackLocalityCostFunction : (multiplier=15.0, imbalance=0.2569525347374165); TableSkewCostFunction : (multiplier=500.0, imbalance=0.3760689783169534); RegionReplicaHostCostFunction : (multiplier=10.0, imbalance=0.0553889913899139); RegionReplicaRackCostFunction : (multiplier=1.0, imbalance=0.05854089790897909); ReadRequestCostFunction : (multiplier=5.0, imbalance=0.06969346106898068); WriteRequestCostFunction : (multiplier=5.0, imbalance=0.07834116112410174); MemStoreSizeCostFunction : (multiplier=5.0, imbalance=0.12533769793201735); StoreFileCostFunction : (multiplier=5.0, imbalance=0.06921401085082914); computedMaxSteps=5577401600 {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26297) Balancer run is improperly triggered by accuracy error of double comparison
[ https://issues.apache.org/jira/browse/HBASE-26297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26297: Description: {code:java} protected synchronized boolean areSomeRegionReplicasColocated(Cluster c) { regionReplicaHostCostFunction.init(c); if (regionReplicaHostCostFunction.cost() > 0) { return true; } regionReplicaRackCostFunction.init(c); if (regionReplicaRackCostFunction.cost() > 0) { return true; } {code} The values are in double data type. Balancer could get stuck in constant runs and unnecessary moves. {code:java} 2021-09-24 12:02:41,943 INFO org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Running balancer because at least one server hosts replicas of the same region. 2021-09-24 12:01:42,878 INFO org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished computing new moving plan. Computation took 241 ms to try 3048341 different iterations. Found a solution that moves 81 regions; Going from a computed imbalance of 1.7429830473781883E-4 to a new imbalance of 1.6169961756947032E-4. {code} we should use COST_EPSILON instead of 0 for double comparison. was: {code:java} protected synchronized boolean areSomeRegionReplicasColocated(Cluster c) { regionReplicaHostCostFunction.init(c); if (regionReplicaHostCostFunction.cost() > 0) { return true; } regionReplicaRackCostFunction.init(c); if (regionReplicaRackCostFunction.cost() > 0) { return true; } {code} The values are in double data type. we often run into unnecessary runs. {code:java} 2021-09-24 12:02:41,943 INFO org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Running balancer because at least one server hosts replicas of the same region. 2021-09-24 12:01:42,878 INFO org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished computing new moving plan. Computation took 241 ms to try 3048341 different iterations. Found a solution that moves 81 regions; Going from a computed imbalance of 1.7429830473781883E-4 to a new imbalance of 1.6169961756947032E-4. {code} we should use COST_EPSILON instead of 0 for double comparison. > Balancer run is improperly triggered by accuracy error of double comparison > --- > > Key: HBASE-26297 > URL: https://issues.apache.org/jira/browse/HBASE-26297 > Project: HBase > Issue Type: Bug > Components: Balancer > Environment: {code:java} > {code} >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > {code:java} > protected synchronized boolean areSomeRegionReplicasColocated(Cluster c) { > regionReplicaHostCostFunction.init(c); > if (regionReplicaHostCostFunction.cost() > 0) { > return true; > } > regionReplicaRackCostFunction.init(c); > if (regionReplicaRackCostFunction.cost() > 0) { > return true; > } > {code} > The values are in double data type. Balancer could get stuck in constant runs > and unnecessary moves. > {code:java} > 2021-09-24 12:02:41,943 INFO > org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Running > balancer because at least one server hosts replicas of the same region. > 2021-09-24 12:01:42,878 INFO > org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished > computing new moving plan. Computation took 241 ms to try 3048341 > different iterations. Found a solution that moves 81 regions; Going from a > computed imbalance of 1.7429830473781883E-4 to a new imbalance of > 1.6169961756947032E-4. > {code} > we should use COST_EPSILON instead of 0 for double comparison. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HBASE-26297) Balancer run is improperly triggered by accuracy error of double comparison
[ https://issues.apache.org/jira/browse/HBASE-26297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong reassigned HBASE-26297: --- Assignee: Clara Xiong > Balancer run is improperly triggered by accuracy error of double comparison > --- > > Key: HBASE-26297 > URL: https://issues.apache.org/jira/browse/HBASE-26297 > Project: HBase > Issue Type: Bug > Components: Balancer > Environment: {code:java} > {code} >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > {code:java} > protected synchronized boolean areSomeRegionReplicasColocated(Cluster c) { > regionReplicaHostCostFunction.init(c); > if (regionReplicaHostCostFunction.cost() > 0) { > return true; > } > regionReplicaRackCostFunction.init(c); > if (regionReplicaRackCostFunction.cost() > 0) { > return true; > } > {code} > The values are in double data type. we often run into unnecessary runs. > {code:java} > 2021-09-24 12:02:41,943 INFO > org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Running > balancer because at least one server hosts replicas of the same region. > 2021-09-24 12:01:42,878 INFO > org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished > computing new moving plan. Computation took 241 ms to try 3048341 > different iterations. Found a solution that moves 81 regions; Going from a > computed imbalance of 1.7429830473781883E-4 to a new imbalance of > 1.6169961756947032E-4. > {code} > we should use COST_EPSILON instead of 0 for double comparison. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26297) Balancer run is improperly triggered by accuracy error of double comparison
[ https://issues.apache.org/jira/browse/HBASE-26297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26297: Description: {code:java} protected synchronized boolean areSomeRegionReplicasColocated(Cluster c) { regionReplicaHostCostFunction.init(c); if (regionReplicaHostCostFunction.cost() > 0) { return true; } regionReplicaRackCostFunction.init(c); if (regionReplicaRackCostFunction.cost() > 0) { return true; } {code} The values are in double data type. we often run into unnecessary runs. {code:java} 2021-09-24 12:02:41,943 INFO org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Running balancer because at least one server hosts replicas of the same region. 2021-09-24 12:01:42,878 INFO org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished computing new moving plan. Computation took 241 ms to try 3048341 different iterations. Found a solution that moves 81 regions; Going from a computed imbalance of 1.7429830473781883E-4 to a new imbalance of 1.6169961756947032E-4. {code} we should use COST_EPSILON instead of 0 for double comparison. was: {code:java} protected synchronized boolean areSomeRegionReplicasColocated(Cluster c) { regionReplicaHostCostFunction.init(c); if (regionReplicaHostCostFunction.cost() > 0) { return true; } regionReplicaRackCostFunction.init(c); if (regionReplicaRackCostFunction.cost() > 0) { return true; } {code} The values are in double data type. we often run into unnecessary runs. {code:java} 2021-09-24 12:02:41,943 INFO org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Running balancer because at least one server hosts replicas of the same region. 2021-09-24 12:01:42,878 INFO org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished computing new moving plan. Computation took 241 ms to try 3048341 different iterations. Found a solution that moves 81 regions; Going from a computed imbalance of 1.7429830473781883E-4 to a new imbalance of 1.6169961756947032E-4. {code} As in another PR, we should use a delta minimum value instead of 0 for double comparison. > Balancer run is improperly triggered by accuracy error of double comparison > --- > > Key: HBASE-26297 > URL: https://issues.apache.org/jira/browse/HBASE-26297 > Project: HBase > Issue Type: Bug > Components: Balancer > Environment: {code:java} > {code} >Reporter: Clara Xiong >Priority: Major > > {code:java} > protected synchronized boolean areSomeRegionReplicasColocated(Cluster c) { > regionReplicaHostCostFunction.init(c); > if (regionReplicaHostCostFunction.cost() > 0) { > return true; > } > regionReplicaRackCostFunction.init(c); > if (regionReplicaRackCostFunction.cost() > 0) { > return true; > } > {code} > The values are in double data type. we often run into unnecessary runs. > {code:java} > 2021-09-24 12:02:41,943 INFO > org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Running > balancer because at least one server hosts replicas of the same region. > 2021-09-24 12:01:42,878 INFO > org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished > computing new moving plan. Computation took 241 ms to try 3048341 > different iterations. Found a solution that moves 81 regions; Going from a > computed imbalance of 1.7429830473781883E-4 to a new imbalance of > 1.6169961756947032E-4. > {code} > we should use COST_EPSILON instead of 0 for double comparison. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26297) Balancer run is improperly triggered by accuracy error of double comparison
Clara Xiong created HBASE-26297: --- Summary: Balancer run is improperly triggered by accuracy error of double comparison Key: HBASE-26297 URL: https://issues.apache.org/jira/browse/HBASE-26297 Project: HBase Issue Type: Bug Components: Balancer Environment: {code:java} {code} Reporter: Clara Xiong {code:java} protected synchronized boolean areSomeRegionReplicasColocated(Cluster c) { regionReplicaHostCostFunction.init(c); if (regionReplicaHostCostFunction.cost() > 0) { return true; } regionReplicaRackCostFunction.init(c); if (regionReplicaRackCostFunction.cost() > 0) { return true; } {code} The values are in double data type. we often run into unnecessary runs. {code:java} 2021-09-24 12:02:41,943 INFO org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Running balancer because at least one server hosts replicas of the same region. 2021-09-24 12:01:42,878 INFO org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer: Finished computing new moving plan. Computation took 241 ms to try 3048341 different iterations. Found a solution that moves 81 regions; Going from a computed imbalance of 1.7429830473781883E-4 to a new imbalance of 1.6169961756947032E-4. {code} As in another PR, we should use a delta minimum value instead of 0 for double comparison. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-26178) Improve data structure and algorithm for BalanceClusterState to improve computation speed for large cluster
[ https://issues.apache.org/jira/browse/HBASE-26178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415108#comment-17415108 ] Clara Xiong edited comment on HBASE-26178 at 9/14/21, 7:08 PM: --- [https://github.com/apache/hbase/pull/3682] New patch using Agrona for primitive collections. Performance Evauation results is added to design doc. was (Author: claraxiong): [https://github.com/apache/hbase/pull/3682] New patch using Agrona for primitive collections. > Improve data structure and algorithm for BalanceClusterState to improve > computation speed for large cluster > --- > > Key: HBASE-26178 > URL: https://issues.apache.org/jira/browse/HBASE-26178 > Project: HBase > Issue Type: Bug >Reporter: Clara Xiong >Priority: Major > > With ~800 node and ~500 regions per node on our large production cluster, > balancer cannot complete within hours even after we just add 2% servers after > maintenance. > The unit tests with larger number of regions are taking longer and longer > with changes to balancer with recent changes too, evident with the increment > of the time limit recent PR's included. > It is time to replace some of the data structure for better time complexity > including: > int[][] regionsPerServer; // serverIndex -> region list > int[][] regionsPerHost; // hostIndex -> list of regions > int[][] regionsPerRack; // rackIndex -> region list > // serverIndex -> sorted list of regions by primary region index > ArrayList> primariesOfRegionsPerServer; > // hostIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerHost; > // rackIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerRack; > Areas of algorithm improvement include: > # O(n ) to O(1) time to lookup or update per server/host/rack for every > move test iteration.(n = number of regions per server/host/rack). > # O(n ) to O(1) time for reserse lookup of region index from primary index. > # Recomputation of primaryRegionCountSkewCostFunction reduced from O(n ) to > O(1) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26178) Improve data structure and algorithm for BalanceClusterState to improve computation speed for large cluster
[ https://issues.apache.org/jira/browse/HBASE-26178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17415108#comment-17415108 ] Clara Xiong commented on HBASE-26178: - [https://github.com/apache/hbase/pull/3682] New patch using Agrona for primitive collections. > Improve data structure and algorithm for BalanceClusterState to improve > computation speed for large cluster > --- > > Key: HBASE-26178 > URL: https://issues.apache.org/jira/browse/HBASE-26178 > Project: HBase > Issue Type: Bug >Reporter: Clara Xiong >Priority: Major > > With ~800 node and ~500 regions per node on our large production cluster, > balancer cannot complete within hours even after we just add 2% servers after > maintenance. > The unit tests with larger number of regions are taking longer and longer > with changes to balancer with recent changes too, evident with the increment > of the time limit recent PR's included. > It is time to replace some of the data structure for better time complexity > including: > int[][] regionsPerServer; // serverIndex -> region list > int[][] regionsPerHost; // hostIndex -> list of regions > int[][] regionsPerRack; // rackIndex -> region list > // serverIndex -> sorted list of regions by primary region index > ArrayList> primariesOfRegionsPerServer; > // hostIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerHost; > // rackIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerRack; > Areas of algorithm improvement include: > # O(n ) to O(1) time to lookup or update per server/host/rack for every > move test iteration.(n = number of regions per server/host/rack). > # O(n ) to O(1) time for reserse lookup of region index from primary index. > # Recomputation of primaryRegionCountSkewCostFunction reduced from O(n ) to > O(1) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-25697) StochasticBalancer improvement for large scale clusters
[ https://issues.apache.org/jira/browse/HBASE-25697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-25697: Description: h2. Findings on a large scale cluster (100,000 regions on 300 nodes) * Balancer starts and stops before getting a plan * Adding new racks doesn’t trigger balancer * Balancer stops leaving some racks at 50% lower region counts * Regions for large tables don’t get evenly distributed * Observability is poor * Too many knobs makes tuning empirical and takes many experiments h2. Improvements made and being made * Cost function enhancement to capture outliers especially table skew. https://issues.apache.org/jira/browse/HBASE-25625?filter=-2 * Explain why balancer stops https://issues.apache.org/jira/browse/HBASE-25666 will back port too https://issues.apache.org/jira/browse/HBASE-24528 h2. More proposals * minCostNeedBalance for each cost function instead of weights. We want to trigger balancing if any factor is out of balancer instead of trying to combine the factors in arbitrary weights. This makes operation and configuration much easier. * Simulated annealing to lower minCostNeedBalance periodically to unstuck the balancer from sub-optimum then gradually increase to keep the system stable. Also add cost of move as a counter measure for the decision [https://opensourcelibs.com/lib/tempest] * Orchestrated scheduling of compaction, normalizer and balancer * PID approach [https://www.amazon.com/dp/1449361692/ref=rdr_ext_tmb] was: h2. Findings on a large scale cluster (100,000 regions on 300 nodes) * Balancer starts and stops before getting a plan * Adding new racks doesn’t trigger balancer * Balancer stops leaving some racks at 50% lower region counts * Regions for large tables don’t get evenly distributed * Observability is poor * Too many knobs makes tuning empirical and takes many experiments h2. Improvements made and bing made * Cost function enhancement to capture outliers especially table skew. https://issues.apache.org/jira/browse/HBASE-25625?filter=-2 * Explain why balancer stops https://issues.apache.org/jira/browse/HBASE-25666 will back port too https://issues.apache.org/jira/browse/HBASE-24528 h2. More proposals * minCostNeedBalance for each cost function instead of weights. We want to trigger balancing if any factor is out of balancer instead of trying to combine the factors in arbitrary weights. This makes operation and configuration much easier. * Simulated annealing to lower minCostNeedBalance periodically to unstuck the balancer from sub-optimum then gradually increase to keep the system stable. Also add cost of move as a counter measure for the decision [https://opensourcelibs.com/lib/tempest] * Orchestrated scheduling of compaction, normalizer and balancer * PID approach [https://www.amazon.com/dp/1449361692/ref=rdr_ext_tmb] > StochasticBalancer improvement for large scale clusters > --- > > Key: HBASE-25697 > URL: https://issues.apache.org/jira/browse/HBASE-25697 > Project: HBase > Issue Type: Improvement > Components: Balancer, master, UI >Reporter: Clara Xiong >Priority: Major > > h2. Findings on a large scale cluster (100,000 regions on 300 nodes) > * Balancer starts and stops before getting a plan > * Adding new racks doesn’t trigger balancer > * Balancer stops leaving some racks at 50% lower region counts > * Regions for large tables don’t get evenly distributed > * Observability is poor > * Too many knobs makes tuning empirical and takes many experiments > h2. Improvements made and being made > * Cost function enhancement to capture outliers especially table skew. > https://issues.apache.org/jira/browse/HBASE-25625?filter=-2 > * Explain why balancer stops > https://issues.apache.org/jira/browse/HBASE-25666 will back port too > https://issues.apache.org/jira/browse/HBASE-24528 > h2. More proposals > * minCostNeedBalance for each cost function instead of weights. We want to > trigger balancing if any factor is out of balancer instead of trying to > combine the factors in arbitrary weights. This makes operation and > configuration much easier. > * Simulated annealing to lower minCostNeedBalance periodically to unstuck > the balancer from sub-optimum then gradually increase to keep the system > stable. Also add cost of move as a counter measure for the decision > [https://opensourcelibs.com/lib/tempest] > * Orchestrated scheduling of compaction, normalizer and balancer > * PID approach [https://www.amazon.com/dp/1449361692/ref=rdr_ext_tmb] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26252) Add support for reloading balancer configs with BalanceRequest
[ https://issues.apache.org/jira/browse/HBASE-26252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412715#comment-17412715 ] Clara Xiong commented on HBASE-26252: - Thank you for clarification. The PR seems a nice clean up and streamlining. I will review the PR. > Add support for reloading balancer configs with BalanceRequest > -- > > Key: HBASE-26252 > URL: https://issues.apache.org/jira/browse/HBASE-26252 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Minor > > It's currently a pain to iterate on balancer configs. You need to make > changes in hbase-site.xml, then find the full ServerName for the active > HMaster, then execute {{update_configuration ''}}in the > shell, then run the balancer. > Finding the ServerName is actually quite annoying. The best way I've found is > to look at the JMX dump and find {{tag.serverName}}, but that takes a bunch > of steps. > We can make this a good deal more convenient by adding direct support for > reloading the balancer configs into the {{balance}} command. > This could look something like: > {{shell> balance \{RELOAD_CONFIGS => true}}} > Alternatively, we could add another string arg like: > {{shell> balance 'reload_config'}} > Either way, we'd add a new > {{BalanceRequest$Builder#setReloadConfig(boolean)}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26023) tableSkewCostFunction aggregate cost per table incorrectly
[ https://issues.apache.org/jira/browse/HBASE-26023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26023: Summary: tableSkewCostFunction aggregate cost per table incorrectly (was: Overhaul of test cluster set up for table skew) > tableSkewCostFunction aggregate cost per table incorrectly > -- > > Key: HBASE-26023 > URL: https://issues.apache.org/jira/browse/HBASE-26023 > Project: HBase > Issue Type: Sub-task > Components: Balancer, test >Reporter: Clara Xiong >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.6, 2.4.5 > > > There is another bug in the original tableSkew cost function for aggregation > of the cost per table: > If we have 10 regions, one per table, evenly distributed on 10 nodes, the > cost is scale to 1.0. > The more tables we have, the closer the value will be to 1.0. The cost > function becomes useless. > All the balancer tests were set up with large numbers of tables with minimal > regions per table. This artificially inflates the total cost and trigger > balancer runs. With this fix on TableSkewFunction, we need to overhaul the > tests too. We also need to add tests that reflect more diversified scenarios > for table distribution such as large tables with large numbers of regions. > {code:java} > protected double cost() { > double max = cluster.numRegions; > double min = ((double) cluster.numRegions) / cluster.numServers; > double value = 0; > for (int i = 0; i < cluster.numMaxRegionsPerTable.length; i++) { > value += cluster.numMaxRegionsPerTable[i]; > } > LOG.info("min = {}, max = {}, cost= {}", min, max, value); > return scale(min, max, value); > } > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25769) Update default weight of cost functions
[ https://issues.apache.org/jira/browse/HBASE-25769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412231#comment-17412231 ] Clara Xiong commented on HBASE-25769: - [~Xiaolin Ha] More updates for byTable option. We have seen it enabled on a different large cluster ( close to 1000 RS )which worked poorly too. By table works fine with many small tables and with at most a large table on a large cluster. The fix in https://issues.apache.org/jira/browse/HBASE-25739 make tableSkewCostFunction finally work and we switched to rely on it by increasing its weight other than by table option. > Update default weight of cost functions > --- > > Key: HBASE-25769 > URL: https://issues.apache.org/jira/browse/HBASE-25769 > Project: HBase > Issue Type: Sub-task > Components: Balancer >Reporter: Clara Xiong >Priority: Major > > In production, we have seen some critical big tables that handle majority of > the load. Table Skew is becoming more important. With the update of table > skew function, balancer finally works for large table distribution on large > cluster. We should increase the weight from 35 to a level comparable to > region count skew: 500. We can even push further to replace region count skew > by table skew since the latter works in the same way and account for region > distribution per node. > Another weight we found helpful to increase is for store file size cost > function. Ideally if normalizer works perfectly, we don't need to worry about > it since region count skew would have accounted for it. But we are often in a > situation it doesn't. Store file distribution needs to be given more way as > accommodation. we tested changing it from 5 to 200 and it works fine. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25625) StochasticBalancer CostFunctions needs a better way to evaluate resource distribution
[ https://issues.apache.org/jira/browse/HBASE-25625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412227#comment-17412227 ] Clara Xiong commented on HBASE-25625: - With the release of https://issues.apache.org/jira/browse/HBASE-25739, most of the problem we observed are gone. We will revisit this Jira and the PR as needed. > StochasticBalancer CostFunctions needs a better way to evaluate resource > distribution > - > > Key: HBASE-25625 > URL: https://issues.apache.org/jira/browse/HBASE-25625 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > Currently CostFunctions including RegionCountSkewCostFunctions, > PrimaryRegionCountSkewCostFunctions and all load cost functions calculate the > unevenness of the distribution by getting the sum of deviation per region > server. This simple implementation works when the cluster is small. But when > the cluster get larger with more region servers and regions, it doesn't work > well with hot spots or a small number of unbalanced servers. The proposal is > to use the standard deviation of the count per region server to capture the > existence of a small portion of region servers with overwhelming > load/allocation. > TableSkewCostFunction uses the sum of the max deviation region per server for > all tables as the measure of unevenness. It doesn't work in a very common > scenario in operations. Say we have 100 regions on 50 nodes, two on each. We > add 50 new nodes and they have 0 each. The max deviation from the mean is 1, > compared to 99 in the worst case scenario of 100 regions on a single server. > The normalized cost is 1/99 = 0.011 < default threshold of 0.05. Balancer > wouldn't move. The proposal is to use the standard deviation of the count > per region server to detect this scenario, generating a cost of 3.1/31 = 0.1 > in this case. > Patch is in test and will follow shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-26023) Overhaul of test cluster set up for table skew
[ https://issues.apache.org/jira/browse/HBASE-26023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong resolved HBASE-26023. - Fix Version/s: 3.0.0-alpha-1 2.3.6 2.4.5 Resolution: Fixed This was fixed in the same PRs for https://issues.apache.org/jira/browse/HBASE-25739 > Overhaul of test cluster set up for table skew > -- > > Key: HBASE-26023 > URL: https://issues.apache.org/jira/browse/HBASE-26023 > Project: HBase > Issue Type: Sub-task > Components: Balancer, test >Reporter: Clara Xiong >Priority: Major > Fix For: 2.4.5, 2.3.6, 3.0.0-alpha-1 > > > There is another bug in the original tableSkew cost function for aggregation > of the cost per table: > If we have 10 regions, one per table, evenly distributed on 10 nodes, the > cost is scale to 1.0. > The more tables we have, the closer the value will be to 1.0. The cost > function becomes useless. > All the balancer tests were set up with large numbers of tables with minimal > regions per table. This artificially inflates the total cost and trigger > balancer runs. With this fix on TableSkewFunction, we need to overhaul the > tests too. We also need to add tests that reflect more diversified scenarios > for table distribution such as large tables with large numbers of regions. > {code:java} > protected double cost() { > double max = cluster.numRegions; > double min = ((double) cluster.numRegions) / cluster.numServers; > double value = 0; > for (int i = 0; i < cluster.numMaxRegionsPerTable.length; i++) { > value += cluster.numMaxRegionsPerTable[i]; > } > LOG.info("min = {}, max = {}, cost= {}", min, max, value); > return scale(min, max, value); > } > }{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-26237) Improve computation complexity for primaryRegionCountSkewCostFunctio
[ https://issues.apache.org/jira/browse/HBASE-26237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong resolved HBASE-26237. - Fix Version/s: 3.0.0-alpha-2 Resolution: Fixed > Improve computation complexity for primaryRegionCountSkewCostFunctio > > > Key: HBASE-26237 > URL: https://issues.apache.org/jira/browse/HBASE-26237 > Project: HBase > Issue Type: Sub-task > Components: Balancer >Reporter: Clara Xiong >Priority: Minor > Fix For: 3.0.0-alpha-2 > > > Recomputation of primaryRegionCountSkewCostFunction can be reduced from O(n ) > to O(1) by only incrementing the destination and decrementing the source > instead of full recompute. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26252) Add support for reloading balancer configs with BalanceRequest
[ https://issues.apache.org/jira/browse/HBASE-26252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412181#comment-17412181 ] Clara Xiong commented on HBASE-26252: - Sorry if I was not clear [~bbeaudreault] My question was whether reload is required for dry run with changed config. As to the universal usefulness, I am not very convinced. aren't those config dynamically reloaded? > Add support for reloading balancer configs with BalanceRequest > -- > > Key: HBASE-26252 > URL: https://issues.apache.org/jira/browse/HBASE-26252 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Minor > > It's currently a pain to iterate on balancer configs. You need to make > changes in hbase-site.xml, then find the full ServerName for the active > HMaster, then execute {{update_configuration ''}}in the > shell, then run the balancer. > Finding the ServerName is actually quite annoying. The best way I've found is > to look at the JMX dump and find {{tag.serverName}}, but that takes a bunch > of steps. > We can make this a good deal more convenient by adding direct support for > reloading the balancer configs into the {{balance}} command. > This could look something like: > {{shell> balance \{RELOAD_CONFIGS => true}}} > Alternatively, we could add another string arg like: > {{shell> balance 'reload_config'}} > Either way, we'd add a new > {{BalanceRequest$Builder#setReloadConfig(boolean)}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26252) Add support for reloading balancer configs with BalanceRequest
[ https://issues.apache.org/jira/browse/HBASE-26252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411552#comment-17411552 ] Clara Xiong commented on HBASE-26252: - Looking over the pr. do you plan to ask operator and update config and reload config even for dry runs? [~bbeaudreault] > Add support for reloading balancer configs with BalanceRequest > -- > > Key: HBASE-26252 > URL: https://issues.apache.org/jira/browse/HBASE-26252 > Project: HBase > Issue Type: Sub-task >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Minor > > It's currently a pain to iterate on balancer configs. You need to make > changes in hbase-site.xml, then find the full ServerName for the active > HMaster, then execute {{update_configuration ''}}in the > shell, then run the balancer. > Finding the ServerName is actually quite annoying. The best way I've found is > to look at the JMX dump and find {{tag.serverName}}, but that takes a bunch > of steps. > We can make this a good deal more convenient by adding direct support for > reloading the balancer configs into the {{balance}} command. > This could look something like: > {{shell> balance \{RELOAD_CONFIGS => true}}} > Alternatively, we could add another string arg like: > {{shell> balance 'reload_config'}} > Either way, we'd add a new > {{BalanceRequest$Builder#setReloadConfig(boolean)}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26237) Improve computation complexity for primaryRegionCountSkewCostFunctio
[ https://issues.apache.org/jira/browse/HBASE-26237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408300#comment-17408300 ] Clara Xiong commented on HBASE-26237: - [~kulwantsingh011] Thank you for the offer. but this is a spin-off from [https://github.com/apache/hbase/pull/3575.] I am opening a separate PR. > Improve computation complexity for primaryRegionCountSkewCostFunctio > > > Key: HBASE-26237 > URL: https://issues.apache.org/jira/browse/HBASE-26237 > Project: HBase > Issue Type: Sub-task > Components: Balancer >Reporter: Clara Xiong >Priority: Minor > > Recomputation of primaryRegionCountSkewCostFunction can be reduced from O(n ) > to O(1) by only incrementing the destination and decrementing the source > instead of full recompute. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-25697) StochasticBalancer improvement for large scale clusters
[ https://issues.apache.org/jira/browse/HBASE-25697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406982#comment-17406982 ] Clara Xiong edited comment on HBASE-25697 at 8/31/21, 10:10 PM: [~idol] Thank you for the suggestion. Could you elaborate on the time window control you have in mind? Were you taking about putting more weight on the most recent data for dynamic load cost such as read requests? was (Author: claraxiong): [~idol] Thank you for the suggestion. Could you elaborate on the time window control you have in mind? > StochasticBalancer improvement for large scale clusters > --- > > Key: HBASE-25697 > URL: https://issues.apache.org/jira/browse/HBASE-25697 > Project: HBase > Issue Type: Improvement > Components: Balancer, master, UI >Reporter: Clara Xiong >Priority: Major > > h2. Findings on a large scale cluster (100,000 regions on 300 nodes) > * Balancer starts and stops before getting a plan > * Adding new racks doesn’t trigger balancer > * Balancer stops leaving some racks at 50% lower region counts > * Regions for large tables don’t get evenly distributed > * Observability is poor > * Too many knobs makes tuning empirical and takes many experiments > h2. Improvements made and bing made > * Cost function enhancement to capture outliers especially table skew. > https://issues.apache.org/jira/browse/HBASE-25625?filter=-2 > * Explain why balancer stops > https://issues.apache.org/jira/browse/HBASE-25666 will back port too > https://issues.apache.org/jira/browse/HBASE-24528 > h2. More proposals > * minCostNeedBalance for each cost function instead of weights. We want to > trigger balancing if any factor is out of balancer instead of trying to > combine the factors in arbitrary weights. This makes operation and > configuration much easier. > * Simulated annealing to lower minCostNeedBalance periodically to unstuck > the balancer from sub-optimum then gradually increase to keep the system > stable. Also add cost of move as a counter measure for the decision > [https://opensourcelibs.com/lib/tempest] > * Orchestrated scheduling of compaction, normalizer and balancer > * PID approach [https://www.amazon.com/dp/1449361692/ref=rdr_ext_tmb] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26237) Improve computation complexity for primaryRegionCountSkewCostFunctio
[ https://issues.apache.org/jira/browse/HBASE-26237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26237: Parent: HBASE-25697 Issue Type: Sub-task (was: Bug) > Improve computation complexity for primaryRegionCountSkewCostFunctio > > > Key: HBASE-26237 > URL: https://issues.apache.org/jira/browse/HBASE-26237 > Project: HBase > Issue Type: Sub-task > Components: Balancer >Reporter: Clara Xiong >Priority: Minor > > Recomputation of primaryRegionCountSkewCostFunction can be reduced from O(n ) > to O(1) by only incrementing the destination and decrementing the source > instead of full recompute. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25697) StochasticBalancer improvement for large scale clusters
[ https://issues.apache.org/jira/browse/HBASE-25697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406982#comment-17406982 ] Clara Xiong commented on HBASE-25697: - [~idol] Thank you for the suggestion. Could you elaborate on the time window control you have in mind? > StochasticBalancer improvement for large scale clusters > --- > > Key: HBASE-25697 > URL: https://issues.apache.org/jira/browse/HBASE-25697 > Project: HBase > Issue Type: Improvement > Components: Balancer, master, UI >Reporter: Clara Xiong >Priority: Major > > h2. Findings on a large scale cluster (100,000 regions on 300 nodes) > * Balancer starts and stops before getting a plan > * Adding new racks doesn’t trigger balancer > * Balancer stops leaving some racks at 50% lower region counts > * Regions for large tables don’t get evenly distributed > * Observability is poor > * Too many knobs makes tuning empirical and takes many experiments > h2. Improvements made and bing made > * Cost function enhancement to capture outliers especially table skew. > https://issues.apache.org/jira/browse/HBASE-25625?filter=-2 > * Explain why balancer stops > https://issues.apache.org/jira/browse/HBASE-25666 will back port too > https://issues.apache.org/jira/browse/HBASE-24528 > h2. More proposals > * minCostNeedBalance for each cost function instead of weights. We want to > trigger balancing if any factor is out of balancer instead of trying to > combine the factors in arbitrary weights. This makes operation and > configuration much easier. > * Simulated annealing to lower minCostNeedBalance periodically to unstuck > the balancer from sub-optimum then gradually increase to keep the system > stable. Also add cost of move as a counter measure for the decision > [https://opensourcelibs.com/lib/tempest] > * Orchestrated scheduling of compaction, normalizer and balancer > * PID approach [https://www.amazon.com/dp/1449361692/ref=rdr_ext_tmb] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26237) Improve computation complexity for primaryRegionCountSkewCostFunctio
Clara Xiong created HBASE-26237: --- Summary: Improve computation complexity for primaryRegionCountSkewCostFunctio Key: HBASE-26237 URL: https://issues.apache.org/jira/browse/HBASE-26237 Project: HBase Issue Type: Bug Components: Balancer Reporter: Clara Xiong Recomputation of primaryRegionCountSkewCostFunction can be reduced from O(n ) to O(1) by only incrementing the destination and decrementing the source instead of full recompute. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26178) Improve data structure and algorithm for BalanceClusterState to improve computation speed for large cluster
[ https://issues.apache.org/jira/browse/HBASE-26178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17406483#comment-17406483 ] Clara Xiong commented on HBASE-26178: - [~zhangduo] and [~stack] thank you for your review. Updated to incorporate your feedback. > Improve data structure and algorithm for BalanceClusterState to improve > computation speed for large cluster > --- > > Key: HBASE-26178 > URL: https://issues.apache.org/jira/browse/HBASE-26178 > Project: HBase > Issue Type: Bug >Reporter: Clara Xiong >Priority: Major > > With ~800 node and ~500 regions per node on our large production cluster, > balancer cannot complete within hours even after we just add 2% servers after > maintenance. > The unit tests with larger number of regions are taking longer and longer > with changes to balancer with recent changes too, evident with the increment > of the time limit recent PR's included. > It is time to replace some of the data structure for better time complexity > including: > int[][] regionsPerServer; // serverIndex -> region list > int[][] regionsPerHost; // hostIndex -> list of regions > int[][] regionsPerRack; // rackIndex -> region list > // serverIndex -> sorted list of regions by primary region index > ArrayList> primariesOfRegionsPerServer; > // hostIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerHost; > // rackIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerRack; > Areas of algorithm improvement include: > # O(n ) to O(1) time to lookup or update per server/host/rack for every > move test iteration.(n = number of regions per server/host/rack). > # O(n ) to O(1) time for reserse lookup of region index from primary index. > # Recomputation of primaryRegionCountSkewCostFunction reduced from O(n ) to > O(1) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26178) Improve data structure and algorithm for BalanceClusterState to improve computation speed for large cluster
[ https://issues.apache.org/jira/browse/HBASE-26178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17405517#comment-17405517 ] Clara Xiong commented on HBASE-26178: - [~zhangduo]updated access. Thank you. > Improve data structure and algorithm for BalanceClusterState to improve > computation speed for large cluster > --- > > Key: HBASE-26178 > URL: https://issues.apache.org/jira/browse/HBASE-26178 > Project: HBase > Issue Type: Bug >Reporter: Clara Xiong >Priority: Major > > With ~800 node and ~500 regions per node on our large production cluster, > balancer cannot complete within hours even after we just add 2% servers after > maintenance. > The unit tests with larger number of regions are taking longer and longer > with changes to balancer with recent changes too, evident with the increment > of the time limit recent PR's included. > It is time to replace some of the data structure for better time complexity > including: > int[][] regionsPerServer; // serverIndex -> region list > int[][] regionsPerHost; // hostIndex -> list of regions > int[][] regionsPerRack; // rackIndex -> region list > // serverIndex -> sorted list of regions by primary region index > ArrayList> primariesOfRegionsPerServer; > // hostIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerHost; > // rackIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerRack; > Areas of algorithm improvement include: > # O(n ) to O(1) time to lookup or update per server/host/rack for every > move test iteration.(n = number of regions per server/host/rack). > # O(n ) to O(1) time for reserse lookup of region index from primary index. > # Recomputation of primaryRegionCountSkewCostFunction reduced from O(n ) to > O(1) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26177) Add support to run balancer overriding current config
[ https://issues.apache.org/jira/browse/HBASE-26177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402292#comment-17402292 ] Clara Xiong commented on HBASE-26177: - I will take it then. Thank you. > Add support to run balancer overriding current config > - > > Key: HBASE-26177 > URL: https://issues.apache.org/jira/browse/HBASE-26177 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Priority: Major > > At times we want a one-time run for balancer with specific aggressive configs > such as recovering from a failure of half cluster. Currently some config are > loaded dynamically but some don't. And it could be error prone when we try to > restore the configs. > With a dry run feature coming, we want to let user run a dry run with the > overriding config to choose the config and then run it. The implementation > becomes more straightforward if we allow overidding configs for both config > and actual runs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26178) Improve data structure and algorithm for BalanceClusterState to improve computation speed for large cluster
[ https://issues.apache.org/jira/browse/HBASE-26178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17402009#comment-17402009 ] Clara Xiong commented on HBASE-26178: - A simple write up for the design at https://docs.google.com/document/d/1ovpAdKBEDMCWCMhhqN5MJAV_Urzv-UGhIDRu7scSNR0/edit# > Improve data structure and algorithm for BalanceClusterState to improve > computation speed for large cluster > --- > > Key: HBASE-26178 > URL: https://issues.apache.org/jira/browse/HBASE-26178 > Project: HBase > Issue Type: Bug >Reporter: Clara Xiong >Priority: Major > > With ~800 node and ~500 regions per node on our large production cluster, > balancer cannot complete within hours even after we just add 2% servers after > maintenance. > The unit tests with larger number of regions are taking longer and longer > with changes to balancer with recent changes too, evident with the increment > of the time limit recent PR's included. > It is time to replace some of the data structure for better time complexity > including: > int[][] regionsPerServer; // serverIndex -> region list > int[][] regionsPerHost; // hostIndex -> list of regions > int[][] regionsPerRack; // rackIndex -> region list > // serverIndex -> sorted list of regions by primary region index > ArrayList> primariesOfRegionsPerServer; > // hostIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerHost; > // rackIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerRack; > Areas of algorithm improvement include: > # O(n ) to O(1) time to lookup or update per server/host/rack for every > move test iteration.(n = number of regions per server/host/rack). > # O(n ) to O(1) time for reserse lookup of region index from primary index. > # Recomputation of primaryRegionCountSkewCostFunction reduced from O(n ) to > O(1) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26207) config hbase.regions.slop is used for stochastic load balancer.
Clara Xiong created HBASE-26207: --- Summary: config hbase.regions.slop is used for stochastic load balancer. Key: HBASE-26207 URL: https://issues.apache.org/jira/browse/HBASE-26207 Project: HBase Issue Type: Bug Components: Balancer Reporter: Clara Xiong The config is used to initialize StochasiticLoadBalancer for the field slop. But the field is not used anywhere as of now. There is a related Jira that removed it from Stochastic Balancer https://issues.apache.org/jira/browse/HBASE-9310 We can clean up the code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26177) Add support to run balancer overriding current config
[ https://issues.apache.org/jira/browse/HBASE-26177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17400652#comment-17400652 ] Clara Xiong commented on HBASE-26177: - [~bbeaudreault] do you plan to implement this for dry run so dry run can become a powerful tool for operators to evaluate config changes? We found it would be useful for our use case at Apple here. > Add support to run balancer overriding current config > - > > Key: HBASE-26177 > URL: https://issues.apache.org/jira/browse/HBASE-26177 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Priority: Major > > At times we want a one-time run for balancer with specific aggressive configs > such as recovering from a failure of half cluster. Currently some config are > loaded dynamically but some don't. And it could be error prone when we try to > restore the configs. > With a dry run feature coming, we want to let user run a dry run with the > overriding config to choose the config and then run it. The implementation > becomes more straightforward if we allow overidding configs for both config > and actual runs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26178) Improve data structure and algorithm for BalanceClusterState to improve computation speed for large cluster
[ https://issues.apache.org/jira/browse/HBASE-26178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26178: Description: With ~800 node and ~500 regions per node on our large production cluster, balancer cannot complete within hours even after we just add 2% servers after maintenance. The unit tests with larger number of regions are taking longer and longer with changes to balancer with recent changes too, evident with the increment of the time limit recent PR's included. It is time to replace some of the data structure for better time complexity including: int[][] regionsPerServer; // serverIndex -> region list int[][] regionsPerHost; // hostIndex -> list of regions int[][] regionsPerRack; // rackIndex -> region list // serverIndex -> sorted list of regions by primary region index ArrayList> primariesOfRegionsPerServer; // hostIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerHost; // rackIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerRack; Areas of algorithm improvement include: # O(n ) to O(1) time to lookup or update per server/host/rack for every move test iteration.(n = number of regions per server/host/rack). # O(n ) to O(1) time for reserse lookup of region index from primary index. # Recomputation of primaryRegionCountSkewCostFunction reduced from O(n ) to O(1) was: With ~800 node and ~500 regions per node on our large production cluster, balancer cannot complete within hours even after we just add 2% servers after maintenance. The unit tests with larger number of regions are taking longer and longer with changes to balancer with recent changes too, evident with the increment of the time limit recent PR's included. It is time to replace some of the data structure for better time complexity including: int[][] regionsPerServer; // serverIndex -> region list int[][] regionsPerHost; // hostIndex -> list of regions int[][] regionsPerRack; // rackIndex -> region list // serverIndex -> sorted list of regions by primary region index ArrayList> primariesOfRegionsPerServer; // hostIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerHost; // rackIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerRack; Areas of algorithm improvement include: # O(n) to O(1) time to lookup or update per server/host/rack for every move test iteration.(n = number of regions per server/host/rack). # O(n) to O(1) time for reserse lookup of region index from primary index. # Recomputation of primaryRegionCountSkewCostFunction reduced from O(n) to O(1) > Improve data structure and algorithm for BalanceClusterState to improve > computation speed for large cluster > --- > > Key: HBASE-26178 > URL: https://issues.apache.org/jira/browse/HBASE-26178 > Project: HBase > Issue Type: Bug >Reporter: Clara Xiong >Priority: Major > > With ~800 node and ~500 regions per node on our large production cluster, > balancer cannot complete within hours even after we just add 2% servers after > maintenance. > The unit tests with larger number of regions are taking longer and longer > with changes to balancer with recent changes too, evident with the increment > of the time limit recent PR's included. > It is time to replace some of the data structure for better time complexity > including: > int[][] regionsPerServer; // serverIndex -> region list > int[][] regionsPerHost; // hostIndex -> list of regions > int[][] regionsPerRack; // rackIndex -> region list > // serverIndex -> sorted list of regions by primary region index > ArrayList> primariesOfRegionsPerServer; > // hostIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerHost; > // rackIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerRack; > Areas of algorithm improvement include: > # O(n ) to O(1) time to lookup or update per server/host/rack for every > move test iteration.(n = number of regions per server/host/rack). > # O(n ) to O(1) time for reserse lookup of region index from primary index. > # Recomputation of primaryRegionCountSkewCostFunction reduced from O(n ) to > O(1) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26178) Improve data structure and algorithm for BalanceClusterState to improve computation speed for large cluster
[ https://issues.apache.org/jira/browse/HBASE-26178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26178: Description: With ~800 node and ~500 regions per node on our large production cluster, balancer cannot complete within hours even after we just add 2% servers after maintenance. The unit tests with larger number of regions are taking longer and longer with changes to balancer with recent changes too, evident with the increment of the time limit recent PR's included. It is time to replace some of the data structure for better time complexity including: int[][] regionsPerServer; // serverIndex -> region list int[][] regionsPerHost; // hostIndex -> list of regions int[][] regionsPerRack; // rackIndex -> region list // serverIndex -> sorted list of regions by primary region index ArrayList> primariesOfRegionsPerServer; // hostIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerHost; // rackIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerRack; Areas of algorithm improvement include: # O(n) to O(1) time to lookup or update per server/host/rack for every move test iteration.(n = number of regions per server/host/rack). # O(n) to O(1) time for reserse lookup of region index from primary index. # Recomputation of primaryRegionCountSkewCostFunction reduced from O(n) to O(1) was: With ~800 node and ~500 regions per node on our large production cluster, balancer cannot complete within hours even after we just add 2% servers after maintenance. The unit tests with larger number of regions are taking longer and longer with changes to balancer with recent changes too, evident with the increment of the time limit recent PR's included. It is time to replace some of the data structure for better time complexity including: int[][] regionsPerServer; // serverIndex -> region list int[][] regionsPerHost; // hostIndex -> list of regions int[][] regionsPerRack; // rackIndex -> region list // serverIndex -> sorted list of regions by primary region index ArrayList> primariesOfRegionsPerServer; // hostIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerHost; // rackIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerRack; which needs O(n) time to lookup or update per server/host/rack for every move test iteration. (n = number of regions per server/host/rack). > Improve data structure and algorithm for BalanceClusterState to improve > computation speed for large cluster > --- > > Key: HBASE-26178 > URL: https://issues.apache.org/jira/browse/HBASE-26178 > Project: HBase > Issue Type: Bug >Reporter: Clara Xiong >Priority: Major > > With ~800 node and ~500 regions per node on our large production cluster, > balancer cannot complete within hours even after we just add 2% servers after > maintenance. > The unit tests with larger number of regions are taking longer and longer > with changes to balancer with recent changes too, evident with the increment > of the time limit recent PR's included. > It is time to replace some of the data structure for better time complexity > including: > int[][] regionsPerServer; // serverIndex -> region list > int[][] regionsPerHost; // hostIndex -> list of regions > int[][] regionsPerRack; // rackIndex -> region list > // serverIndex -> sorted list of regions by primary region index > ArrayList> primariesOfRegionsPerServer; > // hostIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerHost; > // rackIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerRack; > Areas of algorithm improvement include: > # O(n) to O(1) time to lookup or update per server/host/rack for every move > test iteration.(n = number of regions per server/host/rack). > # O(n) to O(1) time for reserse lookup of region index from primary index. > # Recomputation of primaryRegionCountSkewCostFunction reduced from O(n) to > O(1) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26178) Improve data structure and algorithm for BalanceClusterState to improve computation speed for large cluster
[ https://issues.apache.org/jira/browse/HBASE-26178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26178: Summary: Improve data structure and algorithm for BalanceClusterState to improve computation speed for large cluster (was: Improve data structure for BalanceClusterState to improve computation speed for large cluster) > Improve data structure and algorithm for BalanceClusterState to improve > computation speed for large cluster > --- > > Key: HBASE-26178 > URL: https://issues.apache.org/jira/browse/HBASE-26178 > Project: HBase > Issue Type: Bug >Reporter: Clara Xiong >Priority: Major > > With ~800 node and ~500 regions per node on our large production cluster, > balancer cannot complete within hours even after we just add 2% servers after > maintenance. > The unit tests with larger number of regions are taking longer and longer > with changes to balancer with recent changes too, evident with the increment > of the time limit recent PR's included. > It is time to replace some of the data structure for better time complexity > including: > int[][] regionsPerServer; // serverIndex -> region list > int[][] regionsPerHost; // hostIndex -> list of regions > int[][] regionsPerRack; // rackIndex -> region list > // serverIndex -> sorted list of regions by primary region index > ArrayList> primariesOfRegionsPerServer; > // hostIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerHost; > // rackIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerRack; > which needs O(n) time to lookup or update per server/host/rack for every move > test iteration. (n = number of regions per server/host/rack). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26178) Improve data structure for BalanceClusterState to improve computation speed for large cluster
[ https://issues.apache.org/jira/browse/HBASE-26178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26178: Description: With ~800 node and ~500 regions per node on our large production cluster, balancer cannot complete within hours even after we just add 2% servers after maintenance. The unit tests with larger number of regions are taking longer and longer with changes to balancer with recent changes too, evident with the increment of the time limit recent PR's included. It is time to replace some of the data structure for better time complexity including: int[][] regionsPerServer; // serverIndex -> region list int[][] regionsPerHost; // hostIndex -> list of regions int[][] regionsPerRack; // rackIndex -> region list // serverIndex -> sorted list of regions by primary region index ArrayList> primariesOfRegionsPerServer; // hostIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerHost; // rackIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerRack; which needs O(n) time to lookup or update per server/host/rack for every move test iteration. (n = number of regions per server/host/rack). was: With ~800 node and ~500 regions per node on our large production cluster, balancer cannot complete within hours even after we just add 2% servers after maintenance. The unit tests with larger number of regions are taking longer and longer with changes to balancer with recent changes too, evident with the increment of the time limit recent PR's included. It is time to replace some of the data structure for better time complexity including: int[][] regionsPerServer; // serverIndex -> region list int[][] regionsPerHost; // hostIndex -> list of regions int[][] regionsPerRack; // rackIndex -> region list // serverIndex -> sorted list of regions by primary region index ArrayList> primariesOfRegionsPerServer; // hostIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerHost; // rackIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerRack; which needs O(n) time to lookup or update per server/host/rack for every move test iteration. (n = number of regions per server/host/rack). > Improve data structure for BalanceClusterState to improve computation speed > for large cluster > - > > Key: HBASE-26178 > URL: https://issues.apache.org/jira/browse/HBASE-26178 > Project: HBase > Issue Type: Bug >Reporter: Clara Xiong >Priority: Major > > With ~800 node and ~500 regions per node on our large production cluster, > balancer cannot complete within hours even after we just add 2% servers after > maintenance. > The unit tests with larger number of regions are taking longer and longer > with changes to balancer with recent changes too, evident with the increment > of the time limit recent PR's included. > It is time to replace some of the data structure for better time complexity > including: > int[][] regionsPerServer; // serverIndex -> region list > int[][] regionsPerHost; // hostIndex -> list of regions > int[][] regionsPerRack; // rackIndex -> region list > // serverIndex -> sorted list of regions by primary region index > ArrayList> primariesOfRegionsPerServer; > // hostIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerHost; > // rackIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerRack; > which needs O(n) time to lookup or update per server/host/rack for every move > test iteration. (n = number of regions per server/host/rack). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26178) Improve data structure for BalanceClusterState to improve computation speed for large cluster
[ https://issues.apache.org/jira/browse/HBASE-26178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26178: Description: With ~800 node and ~500 regions per node on our large production cluster, balancer cannot complete within hours even after we just add 2% servers after maintenance. The unit tests with larger number of regions are taking longer and longer with changes to balancer with recent changes too, evident with the increment of the time limit recent PR's included. It is time to replace some of the data structure for better time complexity including: int[][] regionsPerServer; // serverIndex -> region list int[][] regionsPerHost; // hostIndex -> list of regions int[][] regionsPerRack; // rackIndex -> region list // serverIndex -> sorted list of regions by primary region index ArrayList> primariesOfRegionsPerServer; // hostIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerHost; // rackIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerRack; which needs O(n) time to lookup or update per server/host/rack for every move test iteration. (n = number of regions per server/host/rack). was: With ~800 node and ~500 regions per node on our large production cluster, balancer cannot complete within hours even after we just add 2% servers after maintenance. The unit tests with larger number of regions are taking longer and longer with changes to balancer with recent changes too, evident with the increment of the time limit recent PR's included. It is time to replace some of the data structure for better time complexity including: int[][] regionsPerServer; // serverIndex -> region list int[][] regionsPerHost; // hostIndex -> list of regions int[][] regionsPerRack; // rackIndex -> region list // serverIndex -> sorted list of regions by primary region index ArrayList> primariesOfRegionsPerServer; // hostIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerHost; // rackIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerRack; which needs O(n) time to update for every move test iteration. (n = number of regions per server/host/rack). > Improve data structure for BalanceClusterState to improve computation speed > for large cluster > - > > Key: HBASE-26178 > URL: https://issues.apache.org/jira/browse/HBASE-26178 > Project: HBase > Issue Type: Bug >Reporter: Clara Xiong >Priority: Major > > With ~800 node and ~500 regions per node on our large production cluster, > balancer cannot complete within hours even after we just add 2% servers after > maintenance. > The unit tests with larger number of regions are taking longer and longer > with changes to balancer with recent changes too, evident with the increment > of the time limit recent PR's included. > It is time to replace some of the data structure for better time complexity > including: > int[][] regionsPerServer; // serverIndex -> region list > int[][] regionsPerHost; // hostIndex -> list of regions > int[][] regionsPerRack; // rackIndex -> region list > // serverIndex -> sorted list of regions by primary region index > ArrayList> primariesOfRegionsPerServer; > // hostIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerHost; > // rackIndex -> sorted list of regions by primary region index > int[][] primariesOfRegionsPerRack; > which needs O(n) time to lookup or update per server/host/rack for every move > test iteration. (n = number of regions per server/host/rack). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24643) Replace Cluster#primariesOfRegionsPerServer from int array to treemap
[ https://issues.apache.org/jira/browse/HBASE-24643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17395023#comment-17395023 ] Clara Xiong commented on HBASE-24643: - Open https://issues.apache.org/jira/browse/HBASE-26178 to capture the larger scope of work. > Replace Cluster#primariesOfRegionsPerServer from int array to treemap > - > > Key: HBASE-24643 > URL: https://issues.apache.org/jira/browse/HBASE-24643 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 2.3.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > Currently, primariesOfRegionsPerServer is an int array, moveRegion does heavy > work by searching the array (linearly) and insert/remove an element requires > allocating/copying the whole array. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26178) Improve data structure for BalanceClusterState to improve computation speed for large cluster
Clara Xiong created HBASE-26178: --- Summary: Improve data structure for BalanceClusterState to improve computation speed for large cluster Key: HBASE-26178 URL: https://issues.apache.org/jira/browse/HBASE-26178 Project: HBase Issue Type: Bug Reporter: Clara Xiong With ~800 node and ~500 regions per node on our large production cluster, balancer cannot complete within hours even after we just add 2% servers after maintenance. The unit tests with larger number of regions are taking longer and longer with changes to balancer with recent changes too, evident with the increment of the time limit recent PR's included. It is time to replace some of the data structure for better time complexity including: int[][] regionsPerServer; // serverIndex -> region list int[][] regionsPerHost; // hostIndex -> list of regions int[][] regionsPerRack; // rackIndex -> region list // serverIndex -> sorted list of regions by primary region index ArrayList> primariesOfRegionsPerServer; // hostIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerHost; // rackIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerRack; which needs O(n) time to update for every move test iteration. (n = number of regions per server/host/rack). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-24643) Replace Cluster#primariesOfRegionsPerServer from int array to treemap
[ https://issues.apache.org/jira/browse/HBASE-24643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394304#comment-17394304 ] Clara Xiong edited comment on HBASE-24643 at 8/7/21, 12:05 AM: --- Replace the following with ArrayList> for quick lookup of elements by value. int[][] primariesOfRegionsPerServer; // serverIndex -> sorted list of regions by primary region // index int[][] primariesOfRegionsPerHost; // hostIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerRack; // rackIndex -> sorted list of regions by primary region index was (Author: claraxiong): All five need to be updated for better performance for balancer. int[][] regionsPerHost; // hostIndex -> list of regions int[][] regionsPerRack; // rackIndex -> region list int[][] primariesOfRegionsPerServer; // serverIndex -> sorted list of regions by primary region // index int[][] primariesOfRegionsPerHost; // hostIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerRack; // rackIndex -> sorted list of regions by primary region index > Replace Cluster#primariesOfRegionsPerServer from int array to treemap > - > > Key: HBASE-24643 > URL: https://issues.apache.org/jira/browse/HBASE-24643 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 2.3.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > Currently, primariesOfRegionsPerServer is an int array, moveRegion does heavy > work by searching the array (linearly) and insert/remove an element requires > allocating/copying the whole array. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26177) Add support to run balancer overriding current config
[ https://issues.apache.org/jira/browse/HBASE-26177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394928#comment-17394928 ] Clara Xiong commented on HBASE-26177: - I was thinking about it as an emergency tool for a forced run. But you are right, if the default setting is different for cost functions, things could be moved back. Thank you for catching this and I like the proposal. > Add support to run balancer overriding current config > - > > Key: HBASE-26177 > URL: https://issues.apache.org/jira/browse/HBASE-26177 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Priority: Major > > At times we want a one-time run for balancer with specific aggressive configs > such as recovering from a failure of half cluster. Currently some config are > loaded dynamically but some don't. And it could be error prone when we try to > restore the configs. > With a dry run feature coming, we want to let user run a dry run with the > overriding config to choose the config and then run it. The implementation > becomes more straightforward if we allow overidding configs for both config > and actual runs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-26177) Add support to run balancer overriding current config
[ https://issues.apache.org/jira/browse/HBASE-26177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated HBASE-26177: Description: At times we want a one-time run for balancer with specific aggressive configs such as recovering from a failure of half cluster. Currently some config are loaded dynamically but some don't. And it could be error prone when we try to restore the configs. With a dry run feature coming, we want to let user run a dry run with the overriding config to choose the config and then run it. The implementation becomes more straightforward if we allow overidding configs for both config and actual runs. was: At times we want a one-time run for balancer with specific aggressive configs such as recovering from a failure or half cluster. Currently some config are loaded dynamically but some don't. And it could be error prone when we try to restore the configs. With a dry run feature coming, we want to let user run a dry run with the overriding config to choose the config and then run it. The implementation becomes more straightforward if we allow overidding configs for both config and actual runs. > Add support to run balancer overriding current config > - > > Key: HBASE-26177 > URL: https://issues.apache.org/jira/browse/HBASE-26177 > Project: HBase > Issue Type: Bug > Components: Balancer >Reporter: Clara Xiong >Priority: Major > > At times we want a one-time run for balancer with specific aggressive configs > such as recovering from a failure of half cluster. Currently some config are > loaded dynamically but some don't. And it could be error prone when we try to > restore the configs. > With a dry run feature coming, we want to let user run a dry run with the > overriding config to choose the config and then run it. The implementation > becomes more straightforward if we allow overidding configs for both config > and actual runs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-24643) Replace Cluster#primariesOfRegionsPerServer from int array to treemap
[ https://issues.apache.org/jira/browse/HBASE-24643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394304#comment-17394304 ] Clara Xiong commented on HBASE-24643: - All five need to be updated for better performance for balancer. int[][] regionsPerHost; // hostIndex -> list of regions int[][] regionsPerRack; // rackIndex -> region list int[][] primariesOfRegionsPerServer; // serverIndex -> sorted list of regions by primary region // index int[][] primariesOfRegionsPerHost; // hostIndex -> sorted list of regions by primary region index int[][] primariesOfRegionsPerRack; // rackIndex -> sorted list of regions by primary region index > Replace Cluster#primariesOfRegionsPerServer from int array to treemap > - > > Key: HBASE-24643 > URL: https://issues.apache.org/jira/browse/HBASE-24643 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 2.3.0 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > Currently, primariesOfRegionsPerServer is an int array, moveRegion does heavy > work by searching the array (linearly) and insert/remove an element requires > allocating/copying the whole array. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26147) Add dry run mode to hbase balancer
[ https://issues.apache.org/jira/browse/HBASE-26147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394278#comment-17394278 ] Clara Xiong commented on HBASE-26147: - I commented on the pr supporting it. I also open a Jira for balancer run (dry run or actual run) to take overriding configs https://issues.apache.org/jira/browse/HBASE-26177 > Add dry run mode to hbase balancer > -- > > Key: HBASE-26147 > URL: https://issues.apache.org/jira/browse/HBASE-26147 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > It's often rather hard to know how the cost function changes you're making > will affect the balance of the cluster, and currently the only way to know is > to run it. If the cost decisions are not good, you may have just moved many > regions towards a non-ideal balance. Region moves themselves are not free for > clients, and the resulting balance may cause a regression. > We should add a mode to the balancer so that it can be invoked without > actually executing any plans. This will allow an administrator to iterate on > their cost functions and used the balancer's logging to see how their changes > would affect the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26177) Add support to run balancer overriding current config
Clara Xiong created HBASE-26177: --- Summary: Add support to run balancer overriding current config Key: HBASE-26177 URL: https://issues.apache.org/jira/browse/HBASE-26177 Project: HBase Issue Type: Bug Components: Balancer Reporter: Clara Xiong At times we want a one-time run for balancer with specific aggressive configs such as recovering from a failure or half cluster. Currently some config are loaded dynamically but some don't. And it could be error prone when we try to restore the configs. With a dry run feature coming, we want to let user run a dry run with the overriding config to choose the config and then run it. The implementation becomes more straightforward if we allow overidding configs for both config and actual runs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26147) Add dry run mode to hbase balancer
[ https://issues.apache.org/jira/browse/HBASE-26147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17389142#comment-17389142 ] Clara Xiong commented on HBASE-26147: - https://issues.apache.org/jira/browse/HBASE-25973?jql=project%20%3D%20HBASE%20AND%20text%20~%20balancer%20ORDER%20BY%20updated%20DESC provide a thorough breakdown of costs and comparison with minCostNeedBalance. > Add dry run mode to hbase balancer > -- > > Key: HBASE-26147 > URL: https://issues.apache.org/jira/browse/HBASE-26147 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > It's often rather hard to know how the cost function changes you're making > will affect the balance of the cluster, and currently the only way to know is > to run it. If the cost decisions are not good, you may have just moved many > regions towards a non-ideal balance. Region moves themselves are not free for > clients, and the resulting balance may cause a regression. > We should add a mode to the balancer so that it can be invoked without > actually executing any plans. This will allow an administrator to iterate on > their cost functions and used the balancer's logging to see how their changes > would affect the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-26147) Add dry run mode to hbase balancer
[ https://issues.apache.org/jira/browse/HBASE-26147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17389138#comment-17389138 ] Clara Xiong commented on HBASE-26147: - This could be very useful. I had been thinking about it for our tuning need on a large cluster with heavy load. Thank you. I wonder if it can be enhanced to take config changes to show what the run would be . > Add dry run mode to hbase balancer > -- > > Key: HBASE-26147 > URL: https://issues.apache.org/jira/browse/HBASE-26147 > Project: HBase > Issue Type: Improvement > Components: Balancer, master >Reporter: Bryan Beaudreault >Assignee: Bryan Beaudreault >Priority: Major > > It's often rather hard to know how the cost function changes you're making > will affect the balance of the cluster, and currently the only way to know is > to run it. If the cost decisions are not good, you may have just moved many > regions towards a non-ideal balance. Region moves themselves are not free for > clients, and the resulting balance may cause a regression. > We should add a mode to the balancer so that it can be invoked without > actually executing any plans. This will allow an administrator to iterate on > their cost functions and used the balancer's logging to see how their changes > would affect the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-25906) UI of master-status to show recent history of balancer desicion
[ https://issues.apache.org/jira/browse/HBASE-25906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17385819#comment-17385819 ] Clara Xiong commented on HBASE-25906: - [~GeorryHuang] Can this and related changes to UI be back ported to 2.3? > UI of master-status to show recent history of balancer desicion > --- > > Key: HBASE-25906 > URL: https://issues.apache.org/jira/browse/HBASE-25906 > Project: HBase > Issue Type: Improvement > Components: Balancer, master, UI >Affects Versions: 3.0.0-alpha-1, 2.5.0 >Reporter: Zhuoyue Huang >Assignee: Zhuoyue Huang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.4 > > Attachments: screenshot-1.png > > > HBASE-24528 provide ‘Balancer Decision’ to display the history that includes > decision factor details and weights and costs while running balancer. > This issue implement 'Balancer Decision' UI web page -- This message was sent by Atlassian Jira (v8.3.4#803005)