[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063381#comment-16063381 ] Kahlil Oppenheimer edited comment on HBASE-18164 at 6/26/17 4:46 PM: - Whoops, great catch! I just uploaded a newer 08 patch that also deals with the other case. was (Author: kahliloppenheimer): Whoops, great catch! I just uploaded a newer patch that also deals with the other case. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2 > > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, > HBASE-18164-06.patch, HBASE-18164-07.patch, HBASE-18164-08.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Patch Available (was: Open) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2 > > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, > HBASE-18164-06.patch, HBASE-18164-07.patch, HBASE-18164-08.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Open (was: Patch Available) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2 > > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, > HBASE-18164-06.patch, HBASE-18164-07.patch, HBASE-18164-08.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Attachment: HBASE-18164-08.patch Whoops, great catch! I just uploaded a newer patch that also deals with the other case. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2 > > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, > HBASE-18164-06.patch, HBASE-18164-07.patch, HBASE-18164-08.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Open (was: Patch Available) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2 > > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, > HBASE-18164-06.patch, HBASE-18164-07.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Attachment: HBASE-18164-07.patch > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2 > > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, > HBASE-18164-06.patch, HBASE-18164-07.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063145#comment-16063145 ] Kahlil Oppenheimer commented on HBASE-18164: Whoops, 06 patch was not rebased off master. Just submitted 07 as the rebased version. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2 > > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, > HBASE-18164-06.patch, HBASE-18164-07.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Patch Available (was: Open) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2 > > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, > HBASE-18164-06.patch, HBASE-18164-07.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063120#comment-16063120 ] Kahlil Oppenheimer edited comment on HBASE-18164 at 6/26/17 2:07 PM: - Just added 06-patch that handles Locality NaN bug. Even if the integration test is ignored in master, we should still handle this case properly. was (Author: kahliloppenheimer): Patch that corrects locality NaN bug > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2 > > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, > HBASE-18164-06.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Patch Available (was: Reopened) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2 > > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, > HBASE-18164-06.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Attachment: HBASE-18164-06.patch Patch that corrects locality NaN bug > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2 > > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, > HBASE-18164-06.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061231#comment-16061231 ] Kahlil Oppenheimer commented on HBASE-18164: I'll take a look over the weekend and see if I can find what's causing the test to fail. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2 > > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Attachment: HBASE-18164-05.patch > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Patch Available (was: Open) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054325#comment-16054325 ] Kahlil Oppenheimer commented on HBASE-18164: Whoops, you are correct! I just uploaded an updated patch. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Open (was: Patch Available) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052243#comment-16052243 ] Kahlil Oppenheimer commented on HBASE-18164: Sounds good! > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052036#comment-16052036 ] Kahlil Oppenheimer edited comment on HBASE-18164 at 6/16/17 4:57 PM: - [~chia7712] you were right about the region localities already being cached. I benchmarked the locality computation without the new explicit cache I added and the performance was the same. It would appear as though the performance upgrade was entirely from making the computation incremental. The patch now does not have the explicit locality cache for per region server. [~busbey] this patch should be ready to go now! was (Author: kahliloppenheimer): [~chia7712] you were right about the region localities already being cached. I benchmarked the locality computation without the new explicit cache I added and the performance was the same. It would appear as though the performance upgrade was entirely from making the computation incremental. [~busbey] this patch should be ready to go now! > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052036#comment-16052036 ] Kahlil Oppenheimer commented on HBASE-18164: [~chia7712] you were right about the region localities already being cached. I benchmarked the locality computation without the new explicit cache I added and the performance was the same. It would appear as though the performance upgrade was entirely from making the computation incremental. [~busbey] this patch should be ready to go now! > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Attachment: HBASE-18164-04.patch > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Patch Available (was: Open) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch, HBASE-18164-04.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Open (was: Patch Available) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16051116#comment-16051116 ] Kahlil Oppenheimer commented on HBASE-18164: We're running a fork of CDH 5.9 (which is a Cloudera fork of HBase 1.2 with patches pulled back from HBase 1.3 and HBase 2.0). > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16051024#comment-16051024 ] Kahlil Oppenheimer commented on HBASE-18164: I'm just running a few follow-up tests to see if we can remove my explicit locality cache without hurting performance. I should have a final patch ready by tomorrow morning. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16051024#comment-16051024 ] Kahlil Oppenheimer edited comment on HBASE-18164 at 6/15/17 8:37 PM: - I'm just running a few follow-up tests to see if we can remove the new explicit locality cache without hurting performance. I should have a final patch ready by tomorrow morning. was (Author: kahliloppenheimer): I'm just running a few follow-up tests to see if we can remove my explicit locality cache without hurting performance. I should have a final patch ready by tomorrow morning. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047235#comment-16047235 ] Kahlil Oppenheimer commented on HBASE-18164: Ok great! Just let me know if there's anything else additional you'd like me to do. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047148#comment-16047148 ] Kahlil Oppenheimer commented on HBASE-18164: Thanks [~busbey]! The failures don't seem related to my changes, but I'm happy to investigate if they fail again on this next run. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043504#comment-16043504 ] Kahlil Oppenheimer commented on HBASE-18164: Interesting, I hadn't realized that the HDFS blocks are cached in the RegionLocationFinder. I will benchmark the code tomorrow with/without the RegionLocationFinder to see if it was adding latency. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042999#comment-16042999 ] Kahlil Oppenheimer commented on HBASE-18164: [~tedyu] I just made your requested changes (shortening lines, renaming, and squashing into a single commit). [~chia7712] The big bottleneck by far was the second bit about collecting all the HDFS blocks of every region for every iteration of the balancer. Adding the caching of the localities at the beginning of the balancer run is responsible for most of the speedup. The first part, albeit less impactful, is still important. The old locality computation was O(# regions * # region servers), which does not scale well as the cluster gets larger. Now it's effectively O(1), which makes a substantial difference. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Attachment: HBASE-18164-02.patch > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Open (was: Patch Available) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Patch Available (was: Open) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Attachment: (was: HBASE-18164-02.patch) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Open (was: Patch Available) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Attachment: HBASE-18164-02.patch > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Patch Available (was: Open) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, > HBASE-18164-02.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Attachment: HBASE-18164-01.patch > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Attachment: (was: HBASE-18164-01.patch) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308 ] Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 6:02 PM: bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the {{cachedLocalities}} array will have a memory consumption of {{4 * numServers * numTables + 4 * numRacks * numTables}} bytes. The {{regionsToMostLocalEntities}} array will have a memory consumption of {{4 * numRegions + 4 * numRacks}} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). was (Author: kahliloppenheimer): bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the {{cachedLocalities}} array will have a memory consumption of {{4 * numServers * numTables + 4 * numRacks * numTables}} bytes. The {{regionsToMostLocalEntities}} will array will have a memory consumption of {{4 * numRegions + 4 * numRacks}} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308 ] Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 6:01 PM: bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the {{cachedLocalities}} array will have a memory consumption of {{4 * numServers * numTables + 4 * numRacks * numTables}} bytes. The {{regionsToMostLocalEntities}} will array will have a memory consumption of {{4 * numRegions + 4 * numRacks}} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). was (Author: kahliloppenheimer): bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the array will have a memory consumption of {{4 * numServers * numTables + 4 * numRacks * numTables}} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308 ] Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 5:58 PM: bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the array will have a memory consumption of {{(4 * numServers * numTables + 4 * numRacks * numTables)}} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). was (Author: kahliloppenheimer): bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the array will have a memory consumption of {{(4 * numServers * numTables + 4 * numRacks * numTables)} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308 ] Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 5:58 PM: bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the array will have a memory consumption of {{4 * numServers * numTables + 4 * numRacks * numTables}} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). was (Author: kahliloppenheimer): bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the array will have a memory consumption of {{(4 * numServers * numTables + 4 * numRacks * numTables)}} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308 ] Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 5:57 PM: bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the array will have a memory consumption of {{(4 * numServers * numTables + 4 * numRacks * numTables)} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). was (Author: kahliloppenheimer): bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the array will have a memory consumption of {{ (4 * numServers * numTables + 4 * numRacks * numTables)} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308 ] Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 5:57 PM: bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the array will have a memory consumption of {{ (4 * numServers * numTables + 4 * numRacks * numTables)} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). was (Author: kahliloppenheimer): bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the array will have a memory consumption of {code} (4 * numServers * numTables + 4 * numRacks * numTables) {code} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308 ] Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 5:56 PM: bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the array will have a memory consumption of {code} (4 * numServers * numTables + 4 * numRacks * numTables) {code} bytes. {quote} How do you handle the case where there is new region (due to split) ? I only see one assignment to cachedLocalities. {quote} The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). was (Author: kahliloppenheimer): bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the array will have a memory consumption of {code} (4 * numServers * numTables + 4 * numRacks * numTables) {code} bytes. bq. How do you handle the case where there is new region (due to split) ? bq. I only see one assignment to cachedLocalities. The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308 ] Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 5:55 PM: bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, the array will have a memory consumption of {code} (4 * numServers * numTables + 4 * numRacks * numTables) {code} bytes. bq. How do you handle the case where there is new region (due to split) ? bq. I only see one assignment to cachedLocalities. The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). was (Author: kahliloppenheimer): bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, if be {code} (4 * numServers * numTables + 4 * numRacks * numTables) {code} bytes. bq. How do you handle the case where there is new region (due to split) ? bq. I only see one assignment to cachedLocalities. The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308 ] Kahlil Oppenheimer commented on HBASE-18164: bq. Do you have estimate on the memory consumption for the newly introduced nested arrays? Yes, if be {code} (4 * numServers * numTables + 4 * numRacks * numTables) {code} bytes. bq. How do you handle the case where there is new region (due to split) ? bq. I only see one assignment to cachedLocalities. The Cluster object is instantiated at the beginning of every balancer run, so each new execution picks up the previous region changes. However, during its execution, the balancer assumes locality is fixed. I also added in the new TableSkewCandidateGenerator (which I initially forgot to include). > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Patch Available (was: Open) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Open (was: Patch Available) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Attachment: HBASE-18164-01.patch > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > One design decision I made is to consider locality cost as the difference > between the best locality that is possible given the current cluster state, > and the currently measured locality. The old locality computation would > measure the locality cost as the difference from the current locality and > 100% locality, but this new computation instead takes the difference between > the current locality for a given region and the best locality for that region > in the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Description: We noticed that during the stochastic load balancer was not scaling well with cluster size. That is to say that on our smaller clusters (~17 tables, ~12 region servers, ~5k regions), the balancer considers ~100,000 cluster configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger clusters (~82 tables, ~160 region servers, ~13k regions) . Because of this, our bigger clusters are not able to converge on balance as quickly for things like table skew, region load, etc. because the balancer does not have enough time to "think". We have re-written the locality cost function to be incremental, meaning it only recomputes cost based on the most recent region move proposed by the balancer, rather than recomputing the cost across all regions/servers every iteration. Further, we also cache the locality of every region on every server at the beginning of the balancer's execution for both the LocalityBasedCostFunction and the LocalityCandidateGenerator to reference. This way, they need not collect all HDFS blocks of every region at each iteration of the balancer. The changes have been running in all 6 of our production clusters and all 4 QA clusters without issue. The speed improvements we noticed are massive. Our big clusters now consider 20x more cluster configurations. One design decision I made is to consider locality cost as the difference between the best locality that is possible given the current cluster state, and the currently measured locality. The old locality computation would measure the locality cost as the difference from the current locality and 100% locality, but this new computation instead takes the difference between the current locality for a given region and the best locality for that region in the cluster. was: We noticed that during the stochastic load balancer was not scaling well with cluster size. That is to say that on our smaller clusters (~17 tables, ~12 region servers, ~5k regions), the balancer considers ~100,000 cluster configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger clusters (~82 tables, ~160 region servers, ~13k regions) . Because of this, our bigger clusters are not able to converge on balance as quickly for things like table skew, region load, etc. because the balancer does not have enough time to "think". We have re-written the locality cost function to be incremental, meaning it only recomputes cost based on the most recent region move proposed by the balancer, rather than recomputing the cost across all regions/servers every iteration. Further, we also cache the locality of every region on every server at the beginning of the balancer's execution for both the LocalityBasedCostFunction and the LocalityCandidateGenerator to reference. This way, they need not collect all HDFS blocks of every region at each iteration of the balancer. The changes have been running in all 6 of our production clusters and all 4 QA clusters without issue. The speed improvements we noticed are massive. Our big clusters now consider 20x more cluster configurations. We are currently preparing a patch for submission. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Description: We noticed that during the stochastic load balancer was not scaling well with cluster size. That is to say that on our smaller clusters (~17 tables, ~12 region servers, ~5k regions), the balancer considers ~100,000 cluster configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger clusters (~82 tables, ~160 region servers, ~13k regions) . Because of this, our bigger clusters are not able to converge on balance as quickly for things like table skew, region load, etc. because the balancer does not have enough time to "think". We have re-written the locality cost function to be incremental, meaning it only recomputes cost based on the most recent region move proposed by the balancer, rather than recomputing the cost across all regions/servers every iteration. Further, we also cache the locality of every region on every server at the beginning of the balancer's execution for both the LocalityBasedCostFunction and the LocalityCandidateGenerator to reference. This way, they need not collect all HDFS blocks of every region at each iteration of the balancer. The changes have been running in all 6 of our production clusters and all 4 QA clusters without issue. The speed improvements we noticed are massive. Our big clusters now consider 20x more cluster configurations. One other important design decision we made was to compute locality cost as a measure of how good locality currently is compared to the best it could be (given the current cluster state). The old cost function assumed that 100% locality was always possible, and calculated the cost as the difference from that state. Instead, this new computation computes the difference from the actual best locality found across the cluster. This means that if a region-server has 75% locality for a given region, but that region has less than 75% locality o all other servers, then the locality cost associated with that region is 0. was: We noticed that during the stochastic load balancer was not scaling well with cluster size. That is to say that on our smaller clusters (~17 tables, ~12 region servers, ~5k regions), the balancer considers ~100,000 cluster configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger clusters (~82 tables, ~160 region servers, ~13k regions) . Because of this, our bigger clusters are not able to converge on balance as quickly for things like table skew, region load, etc. because the balancer does not have enough time to "think". We have re-written the locality cost function to be incremental, meaning it only recomputes cost based on the most recent region move proposed by the balancer, rather than recomputing the cost across all regions/servers every iteration. Further, we also cache the locality of every region on every server at the beginning of the balancer's execution for both the LocalityBasedCostFunction and the LocalityCandidateGenerator to reference. This way, they need not collect all HDFS blocks of every region at each iteration of the balancer. The changes have been running in all 6 of our production clusters and all 4 QA clusters without issue. The speed improvements we noticed are massive. Our big clusters now consider 20x more cluster configurations. We are currently preparing a patch for submission. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Description: We noticed that during the stochastic load balancer was not scaling well with cluster size. That is to say that on our smaller clusters (~17 tables, ~12 region servers, ~5k regions), the balancer considers ~100,000 cluster configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger clusters (~82 tables, ~160 region servers, ~13k regions) . Because of this, our bigger clusters are not able to converge on balance as quickly for things like table skew, region load, etc. because the balancer does not have enough time to "think". We have re-written the locality cost function to be incremental, meaning it only recomputes cost based on the most recent region move proposed by the balancer, rather than recomputing the cost across all regions/servers every iteration. Further, we also cache the locality of every region on every server at the beginning of the balancer's execution for both the LocalityBasedCostFunction and the LocalityCandidateGenerator to reference. This way, they need not collect all HDFS blocks of every region at each iteration of the balancer. The changes have been running in all 6 of our production clusters and all 4 QA clusters without issue. The speed improvements we noticed are massive. Our big clusters now consider 20x more cluster configurations. We are currently preparing a patch for submission. was: We noticed that during the stochastic load balancer was not scaling well with cluster size. That is to say that on our smaller clusters (~17 tables, ~12 region servers, ~5k regions), the balancer considers ~100,000 cluster configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger clusters (~82 tables, ~160 region servers, ~13k regions) . Because of this, our bigger clusters are not able to converge on balance as quickly for things like table skew, region load, etc. because the balancer does not have enough time to "think". We have re-written the locality cost function to be incremental, meaning it only recomputes cost based on the most recent region move proposed by the balancer, rather than recomputing the cost across all regions/servers every iteration. Further, we also cache the locality of every region on every server at the beginning of the balancer's execution for both the LocalityBasedCostFunction and the LocalityCandidateGenerator to reference. This way, they need not collect all HDFS blocks of every region at each iteration of the balancer. The changes have been running in all 6 of our production clusters and all 4 QA clusters without issue. The speed improvements we noticed are massive. Our big clusters now consider 20x more cluster configurations. One other important design decision we made was to compute locality cost as a measure of how good locality currently is compared to the best it could be (given the current cluster state). The old cost function assumed that 100% locality was always possible, and calculated the cost as the difference from that state. Instead, this new computation computes the difference from the actual best locality found across the cluster. This means that if a region-server has 75% locality for a given region, but that region has less than 75% locality o all other servers, then the locality cost associated with that region is 0. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the
[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039830#comment-16039830 ] Kahlil Oppenheimer commented on HBASE-18164: [~busbey] Thanks for the correction on priority! I have a patch uploaded and ready for review :D. > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > We are currently preparing a patch for submission. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Attachment: HBASE-18164-00.patch Patch with new locality cost function, candidate generator, and unit tests > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > We are currently preparing a patch for submission. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator
[ https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-18164: --- Status: Patch Available (was: Open) > Much faster locality cost function and candidate generator > -- > > Key: HBASE-18164 > URL: https://issues.apache.org/jira/browse/HBASE-18164 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Critical > Attachments: HBASE-18164-00.patch > > > We noticed that during the stochastic load balancer was not scaling well with > cluster size. That is to say that on our smaller clusters (~17 tables, ~12 > region servers, ~5k regions), the balancer considers ~100,000 cluster > configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger > clusters (~82 tables, ~160 region servers, ~13k regions) . > Because of this, our bigger clusters are not able to converge on balance as > quickly for things like table skew, region load, etc. because the balancer > does not have enough time to "think". > We have re-written the locality cost function to be incremental, meaning it > only recomputes cost based on the most recent region move proposed by the > balancer, rather than recomputing the cost across all regions/servers every > iteration. > Further, we also cache the locality of every region on every server at the > beginning of the balancer's execution for both the LocalityBasedCostFunction > and the LocalityCandidateGenerator to reference. This way, they need not > collect all HDFS blocks of every region at each iteration of the balancer. > The changes have been running in all 6 of our production clusters and all 4 > QA clusters without issue. The speed improvements we noticed are massive. Our > big clusters now consider 20x more cluster configurations. > We are currently preparing a patch for submission. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-18164) Much faster locality cost function and candidate generator
Kahlil Oppenheimer created HBASE-18164: -- Summary: Much faster locality cost function and candidate generator Key: HBASE-18164 URL: https://issues.apache.org/jira/browse/HBASE-18164 Project: HBase Issue Type: Improvement Components: Balancer Reporter: Kahlil Oppenheimer Assignee: Kahlil Oppenheimer Priority: Minor We noticed that during the stochastic load balancer was not scaling well with cluster size. That is to say that on our smaller clusters (~17 tables, ~12 region servers, ~5k regions), the balancer considers ~100,000 cluster configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger clusters (~82 tables, ~160 region servers, ~13k regions) . Because of this, our bigger clusters are not able to converge on balance as quickly for things like table skew, region load, etc. because the balancer does not have enough time to "think". We have re-written the locality cost function to be incremental, meaning it only recomputes cost based on the most recent region move proposed by the balancer, rather than recomputing the cost across all regions/servers every iteration. Further, we also cache the locality of every region on every server at the beginning of the balancer's execution for both the LocalityBasedCostFunction and the LocalityCandidateGenerator to reference. This way, they need not collect all HDFS blocks of every region at each iteration of the balancer. The changes have been running in all 6 of our production clusters and all 4 QA clusters without issue. The speed improvements we noticed are massive. Our big clusters now consider 20x more cluster configurations. We are currently preparing a patch for submission. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031917#comment-16031917 ] Kahlil Oppenheimer edited comment on HBASE-17707 at 5/31/17 8:38 PM: - We do not use read replica in our clusters. That being said, I believe these changes should still function properly with read replicas enabled. The only issues we encountered formerly were that the table skew cost could actually exceed the region replica cost, causing multiple region replicas to be hosted on the same region. This, however, is not an issue with the table skew changes, but an issue with the fact that region replicas are enforced as a soft constraint via the cost function, rather than as a hard constraint in the balancer logic. I believe that by adjusting the region replica cost logic to scale better to large cluster sizes (as I did in this patch), I think we mitigate this issue. was (Author: kahliloppenheimer): We do not use read replica in our clusters. That being said, I believe these changes should still function properly with read replicas enabled. The only issues we encountered formerly were that the table skew cost could actually exceed the region replica cost, causing multiple region replicas to be hosted on the same region. This, however, is not an issue with the table skew changes, but an issue with the fact that region replicas are enforced as a soft constraint via the cost function, rather than a hard constraint. I believe that by adjusting the region replica cost logic to scale better to large cluster sizes (as I did in this patch), I think we mitigate this issue. > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > HBASE-17707-14.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031917#comment-16031917 ] Kahlil Oppenheimer commented on HBASE-17707: We do not use read replica in our clusters. That being said, I believe these changes should still function properly with read replicas enabled. The only issues we encountered formerly were that the table skew cost could actually exceed the region replica cost, causing multiple region replicas to be hosted on the same region. This, however, is not an issue with my changes, but an issue with the fact that region replicas are enforced as a soft constraint via the cost function, rather than a hard constraint. I believe that by adjusting the region replica cost logic to scale better to large cluster sizes (as I did in this patch), I think we mitigate this issue. > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > HBASE-17707-14.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031917#comment-16031917 ] Kahlil Oppenheimer edited comment on HBASE-17707 at 5/31/17 8:38 PM: - We do not use read replica in our clusters. That being said, I believe these changes should still function properly with read replicas enabled. The only issues we encountered formerly were that the table skew cost could actually exceed the region replica cost, causing multiple region replicas to be hosted on the same region. This, however, is not an issue with the table skew changes, but an issue with the fact that region replicas are enforced as a soft constraint via the cost function, rather than a hard constraint. I believe that by adjusting the region replica cost logic to scale better to large cluster sizes (as I did in this patch), I think we mitigate this issue. was (Author: kahliloppenheimer): We do not use read replica in our clusters. That being said, I believe these changes should still function properly with read replicas enabled. The only issues we encountered formerly were that the table skew cost could actually exceed the region replica cost, causing multiple region replicas to be hosted on the same region. This, however, is not an issue with my changes, but an issue with the fact that region replicas are enforced as a soft constraint via the cost function, rather than a hard constraint. I believe that by adjusting the region replica cost logic to scale better to large cluster sizes (as I did in this patch), I think we mitigate this issue. > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > HBASE-17707-14.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031812#comment-16031812 ] Kahlil Oppenheimer commented on HBASE-17707: [~tedyu] Additionally, we have been running this version of the balancer at HubSpot on all of our production and QA clusters for a few months now and have seen better results with table skew and no issues otherwise. Please let me know if there are still issues you'd like me to address. > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > HBASE-17707-14.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031382#comment-16031382 ] Kahlil Oppenheimer commented on HBASE-17707: [~tedyu] I added logic to reset values in HBaseConfig before each test is run. One thing I noticed is that some tests would set values in the HBase config that would carry over to other tests without being run. > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > HBASE-17707-14.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Status: Patch Available (was: Open) > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > HBASE-17707-14.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Attachment: HBASE-17707-14.patch > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > HBASE-17707-14.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Status: Open (was: Patch Available) > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > HBASE-17707-14.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work stopped] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-17707 stopped by Kahlil Oppenheimer. -- > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > HBASE-17707-14.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-17707 started by Kahlil Oppenheimer. -- > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > HBASE-17707-14.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Status: Open (was: Patch Available) > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Attachment: HBASE-17707-13.patch > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Status: Patch Available (was: Open) > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Attachment: (was: HBASE-17707-13.patch) > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029941#comment-16029941 ] Kahlil Oppenheimer commented on HBASE-17707: [~enis] [~tedyu], sorry I had taken a quick break from this, but just got back to it. I've uploaded yet another version of the patch. Hopefully, this addresses all of your concerns. > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Status: Patch Available (was: In Progress) > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Work started] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-17707 started by Kahlil Oppenheimer. -- > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Attachment: HBASE-17707-13.patch > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, > test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938578#comment-15938578 ] Kahlil Oppenheimer edited comment on HBASE-17707 at 3/23/17 3:34 PM: - bq. We cannot maintain two different cost functions for table skew. Let's remove the old one from the code, and only have the new implementation in this patch. We cannot have dead code lying around and rot. We can close HBASE-17706 as won't fix. I will add the removal of this old cost function to my patch. bq. The new candidate generator TableSkewCandidateGenerator is not added to the SLB::candidateGenerators field which means that it is not used? I can only see the test using it. Is this intended? It has to be enabled by default. Good catch on the table skew candidate generator. I will also add that to the patch as well. I was originally going to do it in a separate patch, but it makes much more sense to just do it here. bq. Did you intend to use the raw variable here instead of calling scale again: Yup! Let's call R the range [0, 1]. We know that scale() maps values into R. We also know that sqrt() maps values from R -> R. Lastly, we know that .9 * r + .1 for any r in R yields another value in R. So can be sure the outcome is in R. No need to call scale function :). Before opening the patch, I'm just repeatedly running the tests 100s of times to feel more confident I haven't missed edge cases since a lot of these test failures are very non-deterministic. was (Author: kahliloppenheimer): bq. We cannot maintain two different cost functions for table skew. Let's remove the old one from the code, and only have the new implementation in this patch. We cannot have dead code lying around and rot. We can close HBASE-17706 as won't fix. I will add the removal of this old cost function to my patch. bq. The new candidate generator TableSkewCandidateGenerator is not added to the SLB::candidateGenerators field which means that it is not used? I can only see the test using it. Is this intended? It has to be enabled by default. Good catch on the table skew candidate generator. I will also add that to the patch as well. I was originally going to do it in a separate patch, but it makes much more sense to just do it here. bq. Did you intend to use the raw variable here instead of calling scale again: Yup! Let's call R the range [0, 1]. We know that scale() maps values into R. We also know that sqrt() maps values from R -> R. Lastly, we know that .9 * r + .1 for any r in R yields another value in R. So can be sure the outcome is in R. No need to call scale function :). > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions
[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938578#comment-15938578 ] Kahlil Oppenheimer commented on HBASE-17707: bq. We cannot maintain two different cost functions for table skew. Let's remove the old one from the code, and only have the new implementation in this patch. We cannot have dead code lying around and rot. We can close HBASE-17706 as won't fix. I will add the removal of this old cost function to my patch. bq. The new candidate generator TableSkewCandidateGenerator is not added to the SLB::candidateGenerators field which means that it is not used? I can only see the test using it. Is this intended? It has to be enabled by default. Good catch on the table skew candidate generator. I will also add that to the patch as well. I was originally going to do it in a separate patch, but it makes much more sense to just do it here. bq. Did you intend to use the raw variable here instead of calling scale again: Yup! Let's call R the range [0, 1]. We know that scale() maps values into R. We also know that sqrt() maps values from R -> R. Lastly, we know that .9 * r + .1 for any r in R yields another value in R. So can be sure the outcome is in R. No need to call scale function :). > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Status: Patch Available (was: Open) > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Status: Open (was: Patch Available) > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Attachment: HBASE-17707-12.patch > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, HBASE-17707-12.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936530#comment-15936530 ] Kahlil Oppenheimer commented on HBASE-17707: Sorry, I just realized it was unclear because the unit test was pre-existing (before my patch) for the old table skew cost function, but now applies to the new one. It is found at TestStochasticLoadBalancer::testTableSkewCost. Also it *is* a hard guarantee that {{numMovesPerTable <= pathologicalNumMoves}}. I made sure to be consistent with the other cost functions when creating this one. The issue is that the old table skew cost function was fundamentally broken. It did not change its cost estimate as the balancer proposed region moves/swaps, meaning the table skew cost it estimated at the beginning of balancing was often the same as at the end, which meant it actually played no role in the balancing at all. I have a separate issue open HBASE-17706 that fixes the behavior in the old TableSkewCostFunction if people would still like to use it. But I can't merge that one until this one gets resolved. In any case, it does not surprise me that this new cost function would alter behavior because we are effectively having table skew considered for the first time in the balancing process. I'll go ahead and rebase/resubmit a new patch that includes the new table skew stuff as well as the fix to the region replica host cost function. > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931834#comment-15931834 ] Kahlil Oppenheimer edited comment on HBASE-17707 at 3/19/17 5:21 PM: - [~enis] [~tedyu] the new table skew cost function is actually guaranteed to be within the [0-1] range (and this behavior is even unit tested!). The cost function does not dominate over other cost functions because it is out of the [0-1] range. Instead, I debugged the breaking test and found that the issue is that the region replica host cost function can produce very small values when there are a lot of regions. In my testing, I found that for some medium-large cluster sizes, the cost function can produce values as small as 2.6 x 10^(-6). Sadly, this means that even with a weight of 5000 (which is what is set in the test), the "soft" requirement of having no two region replicas hosted on the same machine when it is avoidable is not met because the cost function has too small a contribution (even with this high weight). Instead, my latest patch updates the region replica cost function to give it a minimum value (.1) for any amount of co-hosted replicas. This makes it so that if two regions replicas are placed on the same host, the cost will be at least .1 (whether there are 5 or 1,000,000 regions in the cluster). This better enforces the "soft" constraint as it makes sure that no other cost functions can overpower the region replica host cost function. was (Author: kahliloppenheimer): [~enis] [~tedyu] the new table skew cost function is actually guaranteed to be within the [0-1] range (and this behavior is even unit tested!). The cost function does not dominate over other cost functions because it is out of the [0-1] range. Instead, I debugged the breaking test and found that the issue is that the region replica host cost function can produce very small values when there are a lot of regions. In my testing, I found that for some medium-large cluster sizes, the cost function can produce values as small as 2.6 x 10^(-6). Sadly, this means that even with a weight of 5000 (which is what is set in the test), the "soft" requirement of having no two region replicas hosted on the same machine when it is avoidable is not met because the cost function has too small a contribution (even with this high weight). Instead, my latest patch updates the region replica cost function to give it a minimum value (.1) for any amount of co-hosted replicas. This makes it so that if two regions replicas are placed on the same host, the cost will be at least .1 (whether or not there are 5 or 1,000,000 regions in the cluster). This better enforces the "soft" constraint as it makes sure that no other cost functions can overpower the region replica host cost function. > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region
[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931834#comment-15931834 ] Kahlil Oppenheimer commented on HBASE-17707: [~enis] [~tedyu] the new table skew cost function is actually guaranteed to be within the [0-1] range (and this behavior is even unit tested!). The cost function does not dominate over other cost functions because it is out of the [0-1] range. Instead, I debugged the breaking test and found that the issue is that the region replica host cost function can produce very small values when there are a lot of regions. In my testing, I found that for some medium-large cluster sizes, the cost function can produce values as small as 2.6 x 10^(-6). Sadly, this means that even with a weight of 5000 (which is what is set in the test), the "soft" requirement of having no two region replicas hosted on the same machine when it is avoidable is not met because the cost function has too small a contribution (even with this high weight). Instead, my latest patch updates the region replica cost function to give it a minimum value (.1) for any amount of co-hosted replicas. This makes it so that if two regions replicas are placed on the same host, the cost will be at least .1 (whether or not there are 5 or 1,000,000 regions in the cluster). This better enforces the "soft" constraint as it makes sure that no other cost functions can overpower the region replica host cost function. > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930268#comment-15930268 ] Kahlil Oppenheimer commented on HBASE-17707: I just added a patch that will make the RegionReplicaHostCostFunction return higher values even when the cluster is very large (a minimum value of .1 for any co-hosted replicas). I believe this solution will scale better because it will more logically preserve the constraint that the test is checking for (that absolutely no region replicas end up on the same host), even as people add new cost functions and such. > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Attachment: HBASE-17707-11.patch > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Status: Patch Available (was: Reopened) > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Attachment: HBASE-17707-11.patch > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, > HBASE-17707-11.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930137#comment-15930137 ] Kahlil Oppenheimer commented on HBASE-17707: The issue here is that the tests are checking to make sure no two replicas of the same region end up on the same host, but that constraint is only enforced by having a very high weight associated with the RegionReplicaHost cost function. The issue with this is that even with a very high weight (like 5000), the cost value for this function can get really small (like .26) as the number of regions grows large. Thus, the balancer might decide to move a replica of a region to the same host as another because it benefits other cost functions (like table skew) because the RegionReplicaHost cost is so small. I have two solutions that would fix this: 1) Disable table skew generator/cost function when there are region replicas. 2) Change the RegionReplicaHost cost function to make the cost super high for any amount of replicas on the same host, regardless of how many regions are in the cluster. What are your thoughts? > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17706) TableSkewCostFunction improperly computes max skew
[ https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17706: --- Attachment: HBASE-17706-07.patch > TableSkewCostFunction improperly computes max skew > -- > > Key: HBASE-17706 > URL: https://issues.apache.org/jira/browse/HBASE-17706 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Labels: patch > Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, > HBASE-17706-03.patch, HBASE-17706-04.patch, HBASE-17706-05.patch, > HBASE-17706-06.patch, HBASE-17706-07.patch, HBASE-17706.patch > > > We noticed while running unit tests that the TableSkewCostFunction computed > cost did not change as the balancer ran and simulated moves across the > cluster. After investigating, we found that this happened in particular when > the cluster started out with at least one table very strongly skewed. > We noticed that the TableSkewCostFunction depends on a field of the > BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field > is not properly maintained as regionMoves are simulated for the cluster. The > field only ever increases as the maximum number of regions per table > increases, but it does not decrease as the maximum number per table goes down. > This patch corrects that behavior so that the field is accurately maintained, > and thus the TableSkewCostFunction produces a more correct value as the > balancer runs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17706) TableSkewCostFunction improperly computes max skew
[ https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17706: --- Status: Patch Available (was: Open) > TableSkewCostFunction improperly computes max skew > -- > > Key: HBASE-17706 > URL: https://issues.apache.org/jira/browse/HBASE-17706 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Labels: patch > Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, > HBASE-17706-03.patch, HBASE-17706-04.patch, HBASE-17706-05.patch, > HBASE-17706-06.patch, HBASE-17706-07.patch, HBASE-17706.patch > > > We noticed while running unit tests that the TableSkewCostFunction computed > cost did not change as the balancer ran and simulated moves across the > cluster. After investigating, we found that this happened in particular when > the cluster started out with at least one table very strongly skewed. > We noticed that the TableSkewCostFunction depends on a field of the > BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field > is not properly maintained as regionMoves are simulated for the cluster. The > field only ever increases as the maximum number of regions per table > increases, but it does not decrease as the maximum number per table goes down. > This patch corrects that behavior so that the field is accurately maintained, > and thus the TableSkewCostFunction produces a more correct value as the > balancer runs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17706) TableSkewCostFunction improperly computes max skew
[ https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17706: --- Status: Open (was: Patch Available) > TableSkewCostFunction improperly computes max skew > -- > > Key: HBASE-17706 > URL: https://issues.apache.org/jira/browse/HBASE-17706 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Labels: patch > Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, > HBASE-17706-03.patch, HBASE-17706-04.patch, HBASE-17706-05.patch, > HBASE-17706-06.patch, HBASE-17706-07.patch, HBASE-17706.patch > > > We noticed while running unit tests that the TableSkewCostFunction computed > cost did not change as the balancer ran and simulated moves across the > cluster. After investigating, we found that this happened in particular when > the cluster started out with at least one table very strongly skewed. > We noticed that the TableSkewCostFunction depends on a field of the > BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field > is not properly maintained as regionMoves are simulated for the cluster. The > field only ever increases as the maximum number of regions per table > increases, but it does not decrease as the maximum number per table goes down. > This patch corrects that behavior so that the field is accurately maintained, > and thus the TableSkewCostFunction produces a more correct value as the > balancer runs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17706) TableSkewCostFunction improperly computes max skew
[ https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928158#comment-15928158 ] Kahlil Oppenheimer commented on HBASE-17706: [~tedyu] I just rebased the patch and resubmitted it. It might break on one test in TestStochasticLoadBalancer2 until HBASE-17707 is merged in (which contained a fix that affects this test). > TableSkewCostFunction improperly computes max skew > -- > > Key: HBASE-17706 > URL: https://issues.apache.org/jira/browse/HBASE-17706 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Labels: patch > Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, > HBASE-17706-03.patch, HBASE-17706-04.patch, HBASE-17706-05.patch, > HBASE-17706-06.patch, HBASE-17706.patch > > > We noticed while running unit tests that the TableSkewCostFunction computed > cost did not change as the balancer ran and simulated moves across the > cluster. After investigating, we found that this happened in particular when > the cluster started out with at least one table very strongly skewed. > We noticed that the TableSkewCostFunction depends on a field of the > BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field > is not properly maintained as regionMoves are simulated for the cluster. The > field only ever increases as the maximum number of regions per table > increases, but it does not decrease as the maximum number per table goes down. > This patch corrects that behavior so that the field is accurately maintained, > and thus the TableSkewCostFunction produces a more correct value as the > balancer runs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17706) TableSkewCostFunction improperly computes max skew
[ https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17706: --- Status: Open (was: Patch Available) > TableSkewCostFunction improperly computes max skew > -- > > Key: HBASE-17706 > URL: https://issues.apache.org/jira/browse/HBASE-17706 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Labels: patch > Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, > HBASE-17706-03.patch, HBASE-17706-04.patch, HBASE-17706-05.patch, > HBASE-17706-06.patch, HBASE-17706.patch > > > We noticed while running unit tests that the TableSkewCostFunction computed > cost did not change as the balancer ran and simulated moves across the > cluster. After investigating, we found that this happened in particular when > the cluster started out with at least one table very strongly skewed. > We noticed that the TableSkewCostFunction depends on a field of the > BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field > is not properly maintained as regionMoves are simulated for the cluster. The > field only ever increases as the maximum number of regions per table > increases, but it does not decrease as the maximum number per table goes down. > This patch corrects that behavior so that the field is accurately maintained, > and thus the TableSkewCostFunction produces a more correct value as the > balancer runs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17706) TableSkewCostFunction improperly computes max skew
[ https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17706: --- Status: Patch Available (was: Open) > TableSkewCostFunction improperly computes max skew > -- > > Key: HBASE-17706 > URL: https://issues.apache.org/jira/browse/HBASE-17706 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Labels: patch > Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, > HBASE-17706-03.patch, HBASE-17706-04.patch, HBASE-17706-05.patch, > HBASE-17706-06.patch, HBASE-17706.patch > > > We noticed while running unit tests that the TableSkewCostFunction computed > cost did not change as the balancer ran and simulated moves across the > cluster. After investigating, we found that this happened in particular when > the cluster started out with at least one table very strongly skewed. > We noticed that the TableSkewCostFunction depends on a field of the > BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field > is not properly maintained as regionMoves are simulated for the cluster. The > field only ever increases as the maximum number of regions per table > increases, but it does not decrease as the maximum number per table goes down. > This patch corrects that behavior so that the field is accurately maintained, > and thus the TableSkewCostFunction produces a more correct value as the > balancer runs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17706) TableSkewCostFunction improperly computes max skew
[ https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17706: --- Attachment: HBASE-17706-06.patch > TableSkewCostFunction improperly computes max skew > -- > > Key: HBASE-17706 > URL: https://issues.apache.org/jira/browse/HBASE-17706 > Project: HBase > Issue Type: Bug > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Labels: patch > Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, > HBASE-17706-03.patch, HBASE-17706-04.patch, HBASE-17706-05.patch, > HBASE-17706-06.patch, HBASE-17706.patch > > > We noticed while running unit tests that the TableSkewCostFunction computed > cost did not change as the balancer ran and simulated moves across the > cluster. After investigating, we found that this happened in particular when > the cluster started out with at least one table very strongly skewed. > We noticed that the TableSkewCostFunction depends on a field of the > BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field > is not properly maintained as regionMoves are simulated for the cluster. The > field only ever increases as the maximum number of regions per table > increases, but it does not decrease as the maximum number per table goes down. > This patch corrects that behavior so that the field is accurately maintained, > and thus the TableSkewCostFunction produces a more correct value as the > balancer runs. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926707#comment-15926707 ] Kahlil Oppenheimer commented on HBASE-17707: [~tedyu]: I just fixed the patch so that it should no longer fail that test. I diagnosed the problem as the RegionReplicaHostCostFunction would produce very small values as the cluster scales to be large. In clusters with lots of regions, this would make the balancer choose plans that put two replicas of the same region on the same host. To prevent this from happening, I square-rooted the RegionReplicaHostCostFunction to better distribute the values in the range 0-1 even as the cluster scales up in size. > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Status: Patch Available (was: Open) > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kahlil Oppenheimer updated HBASE-17707: --- Attachment: HBASE-17707-09.patch > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch, HBASE-17707-09.patch, test-balancer2-13617.out > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901555#comment-15901555 ] Kahlil Oppenheimer commented on HBASE-17707: Investigating now > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-17707) New More Accurate Table Skew cost function/generator
[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901555#comment-15901555 ] Kahlil Oppenheimer edited comment on HBASE-17707 at 3/8/17 5:00 PM: Investigating now. If you'd like, you can rollback until I have a fix ready was (Author: kahliloppenheimer): Investigating now > New More Accurate Table Skew cost function/generator > > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer >Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. >Reporter: Kahlil Oppenheimer >Assignee: Kahlil Oppenheimer >Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, > HBASE-17707-08.patch > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)