[jira] [Commented] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17806349#comment-17806349 ] Bryan Beaudreault commented on HBASE-25768: --- Moving out of 2.6.0 > Support an overall coarse and fast balance strategy for StochasticLoadBalancer > -- > > Key: HBASE-25768 > URL: https://issues.apache.org/jira/browse/HBASE-25768 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > Fix For: 3.0.0-beta-2 > > > When we use StochasticLoadBalancer + balanceByTable, we could face two > difficulties. > # For each table, their regions are distributed uniformly, but for the > overall cluster, still exiting imbalance between RSes; > # When there are large-scaled restart of RSes, or expansion for groups or > cluster, we hope the balancer can execute as soon as possible, but the > StochasticLoadBalancer may need a lot of time to compute costs. > We can detect these circumstances in StochasticLoadBalancer(such as using the > percentage of skew tables), and before the normal balance steps trying, we > can add a strategy to let it just balance like the SimpleLoadBalancer or use > few light cost functions here. > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553386#comment-17553386 ] Xiaolin Ha commented on HBASE-25768: The PR is ready,its intention is master and branch-2. > Support an overall coarse and fast balance strategy for StochasticLoadBalancer > -- > > Key: HBASE-25768 > URL: https://issues.apache.org/jira/browse/HBASE-25768 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > > When we use StochasticLoadBalancer + balanceByTable, we could face two > difficulties. > # For each table, their regions are distributed uniformly, but for the > overall cluster, still exiting imbalance between RSes; > # When there are large-scaled restart of RSes, or expansion for groups or > cluster, we hope the balancer can execute as soon as possible, but the > StochasticLoadBalancer may need a lot of time to compute costs. > We can detect these circumstances in StochasticLoadBalancer(such as using the > percentage of skew tables), and before the normal balance steps trying, we > can add a strategy to let it just balance like the SimpleLoadBalancer or use > few light cost functions here. > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533612#comment-17533612 ] Zheng Wang commented on HBASE-25768: [~Xiaolin Ha] Yeah, you are right, this patch is useful for some cases. > Support an overall coarse and fast balance strategy for StochasticLoadBalancer > -- > > Key: HBASE-25768 > URL: https://issues.apache.org/jira/browse/HBASE-25768 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > > When we use StochasticLoadBalancer + balanceByTable, we could face two > difficulties. > # For each table, their regions are distributed uniformly, but for the > overall cluster, still exiting imbalance between RSes; > # When there are large-scaled restart of RSes, or expansion for groups or > cluster, we hope the balancer can execute as soon as possible, but the > StochasticLoadBalancer may need a lot of time to compute costs. > We can detect these circumstances in StochasticLoadBalancer(such as using the > percentage of skew tables), and before the normal balance steps trying, we > can add a strategy to let it just balance like the SimpleLoadBalancer or use > few light cost functions here. > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532775#comment-17532775 ] Xiaolin Ha commented on HBASE-25768: Hi, [~kingWang] , the patch is for branch-2.4+. If you want to apply this patch for 2.1.0, most improvements since 2.1.0 is not required here. The main changes here are the chosen cost functions. You can try to back port it if you have interest. Thanks. > Support an overall coarse and fast balance strategy for StochasticLoadBalancer > -- > > Key: HBASE-25768 > URL: https://issues.apache.org/jira/browse/HBASE-25768 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > > When we use StochasticLoadBalancer + balanceByTable, we could face two > difficulties. > # For each table, their regions are distributed uniformly, but for the > overall cluster, still exiting imbalance between RSes; > # When there are large-scaled restart of RSes, or expansion for groups or > cluster, we hope the balancer can execute as soon as possible, but the > StochasticLoadBalancer may need a lot of time to compute costs. > We can detect these circumstances in StochasticLoadBalancer(such as using the > percentage of skew tables), and before the normal balance steps trying, we > can add a strategy to let it just balance like the SimpleLoadBalancer or use > few light cost functions here. > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17532088#comment-17532088 ] Xiaolin Ha commented on HBASE-25768: [~filtertip] , thanks, I think the main reason why disabled balanceByTable and increased multiplier for the cost of table skew work well is that the computedMaxSteps is far smaller in one round of balance cluster, it balances all the tables on the cluster in one round. But when enabling balanceByTable, balance cluster needs computedMaxSteps*tableCount in one round. It is time consuming, but it balanced more accurately, because it considers all the cost functions for all the table, instead of progressive convergence by balancing cluster time after time. Since all tables computed costs together, maybe you can not balance a pressure skew table when disabling balanceByTable. Here if we use an overall checker to trigger coarse balance, it can increase the speed when the cluster/table has skew issues, because the time that calculate costs also should multiple the count of cost functions and time consuming of each cost functions. > Support an overall coarse and fast balance strategy for StochasticLoadBalancer > -- > > Key: HBASE-25768 > URL: https://issues.apache.org/jira/browse/HBASE-25768 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > > When we use StochasticLoadBalancer + balanceByTable, we could face two > difficulties. > # For each table, their regions are distributed uniformly, but for the > overall cluster, still exiting imbalance between RSes; > # When there are large-scaled restart of RSes, or expansion for groups or > cluster, we hope the balancer can execute as soon as possible, but the > StochasticLoadBalancer may need a lot of time to compute costs. > We can detect these circumstances in StochasticLoadBalancer(such as using the > percentage of skew tables), and before the normal balance steps trying, we > can add a strategy to let it just balance like the SimpleLoadBalancer or use > few light cost functions here. > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529821#comment-17529821 ] kangTwang commented on HBASE-25768: --- [~filtertip] I've tried this parameter in the environment before. No, it's just a temporary scheme > Support an overall coarse and fast balance strategy for StochasticLoadBalancer > -- > > Key: HBASE-25768 > URL: https://issues.apache.org/jira/browse/HBASE-25768 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > > When we use StochasticLoadBalancer + balanceByTable, we could face two > difficulties. > # For each table, their regions are distributed uniformly, but for the > overall cluster, still exiting imbalance between RSes; > # When there are large-scaled restart of RSes, or expansion for groups or > cluster, we hope the balancer can execute as soon as possible, but the > StochasticLoadBalancer may need a lot of time to compute costs. > We can detect these circumstances in StochasticLoadBalancer(such as using the > percentage of skew tables), and before the normal balance steps trying, we > can add a strategy to let it just balance like the SimpleLoadBalancer or use > few light cost functions here. > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529818#comment-17529818 ] kangTwang commented on HBASE-25768: --- [~Xiaolin Ha] Will there be a patch of HBase 2.1.0 version at present? PR is 3.x version? > Support an overall coarse and fast balance strategy for StochasticLoadBalancer > -- > > Key: HBASE-25768 > URL: https://issues.apache.org/jira/browse/HBASE-25768 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > > When we use StochasticLoadBalancer + balanceByTable, we could face two > difficulties. > # For each table, their regions are distributed uniformly, but for the > overall cluster, still exiting imbalance between RSes; > # When there are large-scaled restart of RSes, or expansion for groups or > cluster, we hope the balancer can execute as soon as possible, but the > StochasticLoadBalancer may need a lot of time to compute costs. > We can detect these circumstances in StochasticLoadBalancer(such as using the > percentage of skew tables), and before the normal balance steps trying, we > can add a strategy to let it just balance like the SimpleLoadBalancer or use > few light cost functions here. > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529777#comment-17529777 ] Zheng Wang commented on HBASE-25768: I encountered similar issue recently, a cluster has 1000+ table, when i enable balanceByTable, it spend several hours to do the balance, finally i disable it, and set hbase.master.balancer.stochastic.tableSkewCost to 1000 instead, it works well. > Support an overall coarse and fast balance strategy for StochasticLoadBalancer > -- > > Key: HBASE-25768 > URL: https://issues.apache.org/jira/browse/HBASE-25768 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > > When we use StochasticLoadBalancer + balanceByTable, we could face two > difficulties. > # For each table, their regions are distributed uniformly, but for the > overall cluster, still exiting imbalance between RSes; > # When there are large-scaled restart of RSes, or expansion for groups or > cluster, we hope the balancer can execute as soon as possible, but the > StochasticLoadBalancer may need a lot of time to compute costs. > We can detect these circumstances in StochasticLoadBalancer(such as using the > percentage of skew tables), and before the normal balance steps trying, we > can add a strategy to let it just balance like the SimpleLoadBalancer or use > few light cost functions here. > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17527488#comment-17527488 ] Xiaolin Ha commented on HBASE-25768: Thanks for your attention, [~kingWang] , I'll complete the PR in the near few days. > Support an overall coarse and fast balance strategy for StochasticLoadBalancer > -- > > Key: HBASE-25768 > URL: https://issues.apache.org/jira/browse/HBASE-25768 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > > When we use StochasticLoadBalancer + balanceByTable, we could face two > difficulties. > # For each table, their regions are distributed uniformly, but for the > overall cluster, still exiting imbalance between RSes; > # When there are large-scaled restart of RSes, or expansion for groups or > cluster, we hope the balancer can execute as soon as possible, but the > StochasticLoadBalancer may need a lot of time to compute costs. > We can detect these circumstances in StochasticLoadBalancer(such as using the > percentage of skew tables), and before the normal balance steps trying, we > can add a strategy to let it just balance like the SimpleLoadBalancer or use > few light cost functions here. > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526990#comment-17526990 ] kangTwang commented on HBASE-25768: --- Hi [~Xiaolin Ha] : Has this PR not been completed yet? > Support an overall coarse and fast balance strategy for StochasticLoadBalancer > -- > > Key: HBASE-25768 > URL: https://issues.apache.org/jira/browse/HBASE-25768 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > > When we use StochasticLoadBalancer + balanceByTable, we could face two > difficulties. > # For each table, their regions are distributed uniformly, but for the > overall cluster, still exiting imbalance between RSes; > # When there are large-scaled restart of RSes, or expansion for groups or > cluster, we hope the balancer can execute as soon as possible, but the > StochasticLoadBalancer may need a lot of time to compute costs. > We can detect these circumstances in StochasticLoadBalancer(such as using the > percentage of skew tables), and before the normal balance steps trying, we > can add a strategy to let it just balance like the SimpleLoadBalancer or use > few light cost functions here. > > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511381#comment-17511381 ] Xiaolin Ha commented on HBASE-25768: I have attached a design doc, suggestions are welcome. > Support an overall coarse and fast balance strategy for StochasticLoadBalancer > -- > > Key: HBASE-25768 > URL: https://issues.apache.org/jira/browse/HBASE-25768 > Project: HBase > Issue Type: Improvement > Components: Balancer >Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13 >Reporter: Xiaolin Ha >Assignee: Xiaolin Ha >Priority: Major > > When we use StochasticLoadBalancer + balanceByTable, we could face two > difficulties. > # For each table, their regions are distributed uniformly, but for the > overall cluster, still exiting imbalance between RSes; > # When there are large-scaled restart of RSes, or expansion for groups or > cluster, we hope the balancer can execute as soon as possible, but the > StochasticLoadBalancer may need a lot of time to compute costs. > We can detect these circumstances in StochasticLoadBalancer(such as using the > percentage of skew tables), and before the normal balance steps trying, we > can add a strategy to let it just balance like the SimpleLoadBalancer or use > few light cost functions here. > > -- This message was sent by Atlassian Jira (v8.20.1#820001)