[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-26 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063381#comment-16063381
 ] 

Kahlil Oppenheimer edited comment on HBASE-18164 at 6/26/17 4:46 PM:
-

Whoops, great catch! I just uploaded a newer 08 patch that also deals with the 
other case.


was (Author: kahliloppenheimer):
Whoops, great catch! I just uploaded a newer patch that also deals with the 
other case.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, 
> HBASE-18164-06.patch, HBASE-18164-07.patch, HBASE-18164-08.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-26 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Patch Available  (was: Open)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, 
> HBASE-18164-06.patch, HBASE-18164-07.patch, HBASE-18164-08.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-26 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Open  (was: Patch Available)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, 
> HBASE-18164-06.patch, HBASE-18164-07.patch, HBASE-18164-08.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-26 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Attachment: HBASE-18164-08.patch

Whoops, great catch! I just uploaded a newer patch that also deals with the 
other case.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, 
> HBASE-18164-06.patch, HBASE-18164-07.patch, HBASE-18164-08.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-26 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Open  (was: Patch Available)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, 
> HBASE-18164-06.patch, HBASE-18164-07.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-26 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Attachment: HBASE-18164-07.patch

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, 
> HBASE-18164-06.patch, HBASE-18164-07.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-26 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063145#comment-16063145
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


Whoops, 06 patch was not rebased off master. Just submitted 07 as the rebased 
version.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, 
> HBASE-18164-06.patch, HBASE-18164-07.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-26 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Patch Available  (was: Open)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, 
> HBASE-18164-06.patch, HBASE-18164-07.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-26 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063120#comment-16063120
 ] 

Kahlil Oppenheimer edited comment on HBASE-18164 at 6/26/17 2:07 PM:
-

Just added 06-patch that handles Locality NaN bug. Even if the integration test 
is ignored in master, we should still handle this case properly.


was (Author: kahliloppenheimer):
Patch that corrects locality NaN bug

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, 
> HBASE-18164-06.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-26 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Patch Available  (was: Reopened)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, 
> HBASE-18164-06.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-26 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Attachment: HBASE-18164-06.patch

Patch that corrects locality NaN bug

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch, 
> HBASE-18164-06.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-23 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061231#comment-16061231
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


I'll take a look over the weekend and see if I can find what's causing the test 
to fail.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Fix For: 3.0.0, 1.4.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-19 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Attachment: HBASE-18164-05.patch

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-19 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Patch Available  (was: Open)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-19 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054325#comment-16054325
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


Whoops, you are correct! I just uploaded an updated patch.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch, HBASE-18164-05.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-19 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Open  (was: Patch Available)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-16 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052243#comment-16052243
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


Sounds good!

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-16 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052036#comment-16052036
 ] 

Kahlil Oppenheimer edited comment on HBASE-18164 at 6/16/17 4:57 PM:
-

[~chia7712] you were right about the region localities already being cached. I 
benchmarked the locality computation without the new explicit cache I added and 
the performance was the same. It would appear as though the performance upgrade 
was entirely from making the computation incremental. The patch now does not 
have the explicit locality cache for per region server.

[~busbey] this patch should be ready to go now!


was (Author: kahliloppenheimer):
[~chia7712] you were right about the region localities already being cached. I 
benchmarked the locality computation without the new explicit cache I added and 
the performance was the same. It would appear as though the performance upgrade 
was entirely from making the computation incremental.

[~busbey] this patch should be ready to go now!

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-16 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16052036#comment-16052036
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


[~chia7712] you were right about the region localities already being cached. I 
benchmarked the locality computation without the new explicit cache I added and 
the performance was the same. It would appear as though the performance upgrade 
was entirely from making the computation incremental.

[~busbey] this patch should be ready to go now!

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-16 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Attachment: HBASE-18164-04.patch

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-16 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Patch Available  (was: Open)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch, HBASE-18164-04.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-16 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Open  (was: Patch Available)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-15 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16051116#comment-16051116
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


We're running a fork of CDH 5.9 (which is a Cloudera fork of HBase 1.2 with 
patches pulled back from HBase 1.3 and HBase 2.0). 

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-15 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16051024#comment-16051024
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


I'm just running a few follow-up tests to see if we can remove my explicit 
locality cache without hurting performance. I should have a final patch ready 
by tomorrow morning.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-15 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16051024#comment-16051024
 ] 

Kahlil Oppenheimer edited comment on HBASE-18164 at 6/15/17 8:37 PM:
-

I'm just running a few follow-up tests to see if we can remove the new explicit 
locality cache without hurting performance. I should have a final patch ready 
by tomorrow morning.


was (Author: kahliloppenheimer):
I'm just running a few follow-up tests to see if we can remove my explicit 
locality cache without hurting performance. I should have a final patch ready 
by tomorrow morning.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-12 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047235#comment-16047235
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


Ok great! Just let me know if there's anything else additional you'd like me to 
do.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-12 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047148#comment-16047148
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


Thanks [~busbey]! The failures don't seem related to my changes, but I'm happy 
to investigate if they fail again on this next run.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-08 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16043504#comment-16043504
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


Interesting, I hadn't realized that the HDFS blocks are cached in the 
RegionLocationFinder. I will benchmark the code tomorrow with/without the 
RegionLocationFinder to see if it was adding latency.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-08 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042999#comment-16042999
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


[~tedyu] I just made your requested changes (shortening lines, renaming, and 
squashing into a single commit).

[~chia7712] The big bottleneck by far was the second bit about collecting all 
the HDFS blocks of every region for every iteration of the balancer. Adding the 
caching of the localities at the beginning of the balancer run is responsible 
for most of the speedup.

The first part, albeit less impactful, is still important. The old locality 
computation was O(# regions * # region servers), which does not scale well as 
the cluster gets larger. Now it's effectively O(1), which makes a substantial 
difference.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-08 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Attachment: HBASE-18164-02.patch

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-08 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Open  (was: Patch Available)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-08 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Patch Available  (was: Open)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-08 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Attachment: (was: HBASE-18164-02.patch)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-08 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Open  (was: Patch Available)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-08 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Attachment: HBASE-18164-02.patch

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-08 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Patch Available  (was: Open)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch, 
> HBASE-18164-02.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-07 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Attachment: HBASE-18164-01.patch

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-07 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Attachment: (was: HBASE-18164-01.patch)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-07 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308
 ] 

Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 6:02 PM:


bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the {{cachedLocalities}} array will have a memory consumption of {{4 * 
numServers * numTables + 4 * numRacks * numTables}} bytes. The 
{{regionsToMostLocalEntities}} array will have a memory consumption of {{4 * 
numRegions + 4 * numRacks}} bytes.

{quote} How do you handle the case where there is new region (due to split) ? I 
only see one assignment to cachedLocalities. {quote}
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).


was (Author: kahliloppenheimer):
bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the {{cachedLocalities}} array will have a memory consumption of {{4 * 
numServers * numTables + 4 * numRacks * numTables}} bytes. The 
{{regionsToMostLocalEntities}} will array will have a memory consumption of {{4 
* numRegions + 4 * numRacks}} bytes.

{quote} How do you handle the case where there is new region (due to split) ? I 
only see one assignment to cachedLocalities. {quote}
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-07 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308
 ] 

Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 6:01 PM:


bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the {{cachedLocalities}} array will have a memory consumption of {{4 * 
numServers * numTables + 4 * numRacks * numTables}} bytes. The 
{{regionsToMostLocalEntities}} will array will have a memory consumption of {{4 
* numRegions + 4 * numRacks}} bytes.

{quote} How do you handle the case where there is new region (due to split) ? I 
only see one assignment to cachedLocalities. {quote}
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).


was (Author: kahliloppenheimer):
bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the array will have a memory consumption of {{4 * numServers * numTables + 
4 * numRacks * numTables}} bytes. 

{quote} How do you handle the case where there is new region (due to split) ? I 
only see one assignment to cachedLocalities. {quote}
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-07 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308
 ] 

Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 5:58 PM:


bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the array will have a memory consumption of {{(4 * numServers * numTables 
+ 4 * numRacks * numTables)}} bytes. 

{quote} How do you handle the case where there is new region (due to split) ? I 
only see one assignment to cachedLocalities. {quote}
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).


was (Author: kahliloppenheimer):
bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the array will have a memory consumption of {{(4 * numServers * numTables 
+ 4 * numRacks * numTables)} bytes. 

{quote} How do you handle the case where there is new region (due to split) ? I 
only see one assignment to cachedLocalities. {quote}
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-07 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308
 ] 

Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 5:58 PM:


bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the array will have a memory consumption of {{4 * numServers * numTables + 
4 * numRacks * numTables}} bytes. 

{quote} How do you handle the case where there is new region (due to split) ? I 
only see one assignment to cachedLocalities. {quote}
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).


was (Author: kahliloppenheimer):
bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the array will have a memory consumption of {{(4 * numServers * numTables 
+ 4 * numRacks * numTables)}} bytes. 

{quote} How do you handle the case where there is new region (due to split) ? I 
only see one assignment to cachedLocalities. {quote}
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-07 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308
 ] 

Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 5:57 PM:


bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the array will have a memory consumption of {{(4 * numServers * numTables 
+ 4 * numRacks * numTables)} bytes. 

{quote} How do you handle the case where there is new region (due to split) ? I 
only see one assignment to cachedLocalities. {quote}
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).


was (Author: kahliloppenheimer):
bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the array will have a memory consumption of {{ (4 * numServers * numTables 
+ 4 * numRacks * numTables)} bytes. 

{quote} How do you handle the case where there is new region (due to split) ? I 
only see one assignment to cachedLocalities. {quote}
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-07 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308
 ] 

Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 5:57 PM:


bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the array will have a memory consumption of {{ (4 * numServers * numTables 
+ 4 * numRacks * numTables)} bytes. 

{quote} How do you handle the case where there is new region (due to split) ? I 
only see one assignment to cachedLocalities. {quote}
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).


was (Author: kahliloppenheimer):
bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the array will have a memory consumption of {code} (4 * numServers * 
numTables + 4 * numRacks * numTables) {code} bytes. 

{quote} How do you handle the case where there is new region (due to split) ? I 
only see one assignment to cachedLocalities. {quote}
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-07 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308
 ] 

Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 5:56 PM:


bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the array will have a memory consumption of {code} (4 * numServers * 
numTables + 4 * numRacks * numTables) {code} bytes. 

{quote} How do you handle the case where there is new region (due to split) ? I 
only see one assignment to cachedLocalities. {quote}
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).


was (Author: kahliloppenheimer):
bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the array will have a memory consumption of {code} (4 * numServers * 
numTables + 4 * numRacks * numTables) {code} bytes. 

bq. How do you handle the case where there is new region (due to split) ?
bq. I only see one assignment to cachedLocalities.
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-07 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308
 ] 

Kahlil Oppenheimer edited comment on HBASE-18164 at 6/7/17 5:55 PM:


bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, the array will have a memory consumption of {code} (4 * numServers * 
numTables + 4 * numRacks * numTables) {code} bytes. 

bq. How do you handle the case where there is new region (due to split) ?
bq. I only see one assignment to cachedLocalities.
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).


was (Author: kahliloppenheimer):
bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, if be {code} (4 * numServers * numTables + 4 * numRacks * numTables) 
{code} bytes. 

bq. How do you handle the case where there is new region (due to split) ?
bq. I only see one assignment to cachedLocalities.
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-07 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041308#comment-16041308
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


bq. Do you have estimate on the memory consumption for the newly introduced 
nested arrays?
Yes, if be {code} (4 * numServers * numTables + 4 * numRacks * numTables) 
{code} bytes. 

bq. How do you handle the case where there is new region (due to split) ?
bq. I only see one assignment to cachedLocalities.
The Cluster object is instantiated at the beginning of every balancer run, so 
each new execution picks up the previous region changes. However, during its 
execution, the balancer assumes locality is fixed.

I also added in the new TableSkewCandidateGenerator (which I initially forgot 
to include).

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-07 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Patch Available  (was: Open)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-07 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Open  (was: Patch Available)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-07 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Attachment: HBASE-18164-01.patch

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch, HBASE-18164-01.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-06 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Description: 
We noticed that during the stochastic load balancer was not scaling well with 
cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
region servers, ~5k regions), the balancer considers ~100,000 cluster 
configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
clusters (~82 tables, ~160 region servers, ~13k regions) .

Because of this, our bigger clusters are not able to converge on balance as 
quickly for things like table skew, region load, etc. because the balancer does 
not have enough time to "think".

We have re-written the locality cost function to be incremental, meaning it 
only recomputes cost based on the most recent region move proposed by the 
balancer, rather than recomputing the cost across all regions/servers every 
iteration.

Further, we also cache the locality of every region on every server at the 
beginning of the balancer's execution for both the LocalityBasedCostFunction 
and the LocalityCandidateGenerator to reference. This way, they need not 
collect all HDFS blocks of every region at each iteration of the balancer.

The changes have been running in all 6 of our production clusters and all 4 QA 
clusters without issue. The speed improvements we noticed are massive. Our big 
clusters now consider 20x more cluster configurations.

One design decision I made is to consider locality cost as the difference 
between the best locality that is possible given the current cluster state, and 
the currently measured locality. The old locality computation would measure the 
locality cost as the difference from the current locality and 100% locality, 
but this new computation instead takes the difference between the current 
locality for a given region and the best locality for that region in the 
cluster.

  was:
We noticed that during the stochastic load balancer was not scaling well with 
cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
region servers, ~5k regions), the balancer considers ~100,000 cluster 
configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
clusters (~82 tables, ~160 region servers, ~13k regions) .

Because of this, our bigger clusters are not able to converge on balance as 
quickly for things like table skew, region load, etc. because the balancer does 
not have enough time to "think".

We have re-written the locality cost function to be incremental, meaning it 
only recomputes cost based on the most recent region move proposed by the 
balancer, rather than recomputing the cost across all regions/servers every 
iteration.

Further, we also cache the locality of every region on every server at the 
beginning of the balancer's execution for both the LocalityBasedCostFunction 
and the LocalityCandidateGenerator to reference. This way, they need not 
collect all HDFS blocks of every region at each iteration of the balancer.

The changes have been running in all 6 of our production clusters and all 4 QA 
clusters without issue. The speed improvements we noticed are massive. Our big 
clusters now consider 20x more cluster configurations.

We are currently preparing a patch for submission.


> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each 

[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-06 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Description: 
We noticed that during the stochastic load balancer was not scaling well with 
cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
region servers, ~5k regions), the balancer considers ~100,000 cluster 
configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
clusters (~82 tables, ~160 region servers, ~13k regions) .

Because of this, our bigger clusters are not able to converge on balance as 
quickly for things like table skew, region load, etc. because the balancer does 
not have enough time to "think".

We have re-written the locality cost function to be incremental, meaning it 
only recomputes cost based on the most recent region move proposed by the 
balancer, rather than recomputing the cost across all regions/servers every 
iteration.

Further, we also cache the locality of every region on every server at the 
beginning of the balancer's execution for both the LocalityBasedCostFunction 
and the LocalityCandidateGenerator to reference. This way, they need not 
collect all HDFS blocks of every region at each iteration of the balancer.

The changes have been running in all 6 of our production clusters and all 4 QA 
clusters without issue. The speed improvements we noticed are massive. Our big 
clusters now consider 20x more cluster configurations.

One other important design decision we made was to compute locality cost as a 
measure of how good locality currently is compared to the best it could be 
(given the current cluster state). The old cost function assumed that 100% 
locality was always possible, and calculated the cost as the difference from 
that state. Instead, this new computation computes the difference from the 
actual best locality found across the cluster. This means that if a 
region-server has 75% locality for a given region, but that region has less 
than 75% locality o all other servers, then the locality cost associated with 
that region is 0.

  was:
We noticed that during the stochastic load balancer was not scaling well with 
cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
region servers, ~5k regions), the balancer considers ~100,000 cluster 
configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
clusters (~82 tables, ~160 region servers, ~13k regions) .

Because of this, our bigger clusters are not able to converge on balance as 
quickly for things like table skew, region load, etc. because the balancer does 
not have enough time to "think".

We have re-written the locality cost function to be incremental, meaning it 
only recomputes cost based on the most recent region move proposed by the 
balancer, rather than recomputing the cost across all regions/servers every 
iteration.

Further, we also cache the locality of every region on every server at the 
beginning of the balancer's execution for both the LocalityBasedCostFunction 
and the LocalityCandidateGenerator to reference. This way, they need not 
collect all HDFS blocks of every region at each iteration of the balancer.

The changes have been running in all 6 of our production clusters and all 4 QA 
clusters without issue. The speed improvements we noticed are massive. Our big 
clusters now consider 20x more cluster configurations.

We are currently preparing a patch for submission.


> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the 

[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-06 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Description: 
We noticed that during the stochastic load balancer was not scaling well with 
cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
region servers, ~5k regions), the balancer considers ~100,000 cluster 
configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
clusters (~82 tables, ~160 region servers, ~13k regions) .

Because of this, our bigger clusters are not able to converge on balance as 
quickly for things like table skew, region load, etc. because the balancer does 
not have enough time to "think".

We have re-written the locality cost function to be incremental, meaning it 
only recomputes cost based on the most recent region move proposed by the 
balancer, rather than recomputing the cost across all regions/servers every 
iteration.

Further, we also cache the locality of every region on every server at the 
beginning of the balancer's execution for both the LocalityBasedCostFunction 
and the LocalityCandidateGenerator to reference. This way, they need not 
collect all HDFS blocks of every region at each iteration of the balancer.

The changes have been running in all 6 of our production clusters and all 4 QA 
clusters without issue. The speed improvements we noticed are massive. Our big 
clusters now consider 20x more cluster configurations.

We are currently preparing a patch for submission.

  was:
We noticed that during the stochastic load balancer was not scaling well with 
cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
region servers, ~5k regions), the balancer considers ~100,000 cluster 
configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
clusters (~82 tables, ~160 region servers, ~13k regions) .

Because of this, our bigger clusters are not able to converge on balance as 
quickly for things like table skew, region load, etc. because the balancer does 
not have enough time to "think".

We have re-written the locality cost function to be incremental, meaning it 
only recomputes cost based on the most recent region move proposed by the 
balancer, rather than recomputing the cost across all regions/servers every 
iteration.

Further, we also cache the locality of every region on every server at the 
beginning of the balancer's execution for both the LocalityBasedCostFunction 
and the LocalityCandidateGenerator to reference. This way, they need not 
collect all HDFS blocks of every region at each iteration of the balancer.

The changes have been running in all 6 of our production clusters and all 4 QA 
clusters without issue. The speed improvements we noticed are massive. Our big 
clusters now consider 20x more cluster configurations.

One other important design decision we made was to compute locality cost as a 
measure of how good locality currently is compared to the best it could be 
(given the current cluster state). The old cost function assumed that 100% 
locality was always possible, and calculated the cost as the difference from 
that state. Instead, this new computation computes the difference from the 
actual best locality found across the cluster. This means that if a 
region-server has 75% locality for a given region, but that region has less 
than 75% locality o all other servers, then the locality cost associated with 
that region is 0.


> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the 

[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-06 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039830#comment-16039830
 ] 

Kahlil Oppenheimer commented on HBASE-18164:


[~busbey] Thanks for the correction on priority! I have a patch uploaded and 
ready for review :D.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> We are currently preparing a patch for submission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-06 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Attachment: HBASE-18164-00.patch

Patch with new locality cost function, candidate generator, and unit tests

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> We are currently preparing a patch for submission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-06 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-18164:
---
Status: Patch Available  (was: Open)

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Attachments: HBASE-18164-00.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> We are currently preparing a patch for submission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-18164) Much faster locality cost function and candidate generator

2017-06-05 Thread Kahlil Oppenheimer (JIRA)
Kahlil Oppenheimer created HBASE-18164:
--

 Summary: Much faster locality cost function and candidate generator
 Key: HBASE-18164
 URL: https://issues.apache.org/jira/browse/HBASE-18164
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Reporter: Kahlil Oppenheimer
Assignee: Kahlil Oppenheimer
Priority: Minor


We noticed that during the stochastic load balancer was not scaling well with 
cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
region servers, ~5k regions), the balancer considers ~100,000 cluster 
configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
clusters (~82 tables, ~160 region servers, ~13k regions) .

Because of this, our bigger clusters are not able to converge on balance as 
quickly for things like table skew, region load, etc. because the balancer does 
not have enough time to "think".

We have re-written the locality cost function to be incremental, meaning it 
only recomputes cost based on the most recent region move proposed by the 
balancer, rather than recomputing the cost across all regions/servers every 
iteration.

Further, we also cache the locality of every region on every server at the 
beginning of the balancer's execution for both the LocalityBasedCostFunction 
and the LocalityCandidateGenerator to reference. This way, they need not 
collect all HDFS blocks of every region at each iteration of the balancer.

The changes have been running in all 6 of our production clusters and all 4 QA 
clusters without issue. The speed improvements we noticed are massive. Our big 
clusters now consider 20x more cluster configurations.

We are currently preparing a patch for submission.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031917#comment-16031917
 ] 

Kahlil Oppenheimer edited comment on HBASE-17707 at 5/31/17 8:38 PM:
-

We do not use read replica in our clusters. That being said, I believe these 
changes should still function properly with read replicas enabled. The only 
issues we encountered formerly were that the table skew cost could actually 
exceed the region replica cost, causing multiple region replicas to be hosted 
on the same region. This, however, is not an issue with the table skew changes, 
but an issue with the fact that region replicas are enforced as a soft 
constraint via the cost function, rather than as a hard constraint in the 
balancer logic. I believe that by adjusting the region replica cost logic to 
scale better to large cluster sizes (as I did in this patch), I think we 
mitigate this issue.


was (Author: kahliloppenheimer):
We do not use read replica in our clusters. That being said, I believe these 
changes should still function properly with read replicas enabled. The only 
issues we encountered formerly were that the table skew cost could actually 
exceed the region replica cost, causing multiple region replicas to be hosted 
on the same region. This, however, is not an issue with the table skew changes, 
but an issue with the fact that region replicas are enforced as a soft 
constraint via the cost function, rather than a hard constraint. I believe that 
by adjusting the region replica cost logic to scale better to large cluster 
sizes (as I did in this patch), I think we mitigate this issue.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031917#comment-16031917
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


We do not use read replica in our clusters. That being said, I believe these 
changes should still function properly with read replicas enabled. The only 
issues we encountered formerly were that the table skew cost could actually 
exceed the region replica cost, causing multiple region replicas to be hosted 
on the same region. This, however, is not an issue with my changes, but an 
issue with the fact that region replicas are enforced as a soft constraint via 
the cost function, rather than a hard constraint. I believe that by adjusting 
the region replica cost logic to scale better to large cluster sizes (as I did 
in this patch), I think we mitigate this issue.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031917#comment-16031917
 ] 

Kahlil Oppenheimer edited comment on HBASE-17707 at 5/31/17 8:38 PM:
-

We do not use read replica in our clusters. That being said, I believe these 
changes should still function properly with read replicas enabled. The only 
issues we encountered formerly were that the table skew cost could actually 
exceed the region replica cost, causing multiple region replicas to be hosted 
on the same region. This, however, is not an issue with the table skew changes, 
but an issue with the fact that region replicas are enforced as a soft 
constraint via the cost function, rather than a hard constraint. I believe that 
by adjusting the region replica cost logic to scale better to large cluster 
sizes (as I did in this patch), I think we mitigate this issue.


was (Author: kahliloppenheimer):
We do not use read replica in our clusters. That being said, I believe these 
changes should still function properly with read replicas enabled. The only 
issues we encountered formerly were that the table skew cost could actually 
exceed the region replica cost, causing multiple region replicas to be hosted 
on the same region. This, however, is not an issue with my changes, but an 
issue with the fact that region replicas are enforced as a soft constraint via 
the cost function, rather than a hard constraint. I believe that by adjusting 
the region replica cost logic to scale better to large cluster sizes (as I did 
in this patch), I think we mitigate this issue.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031812#comment-16031812
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


[~tedyu] Additionally, we have been running this version of the balancer at 
HubSpot on all of our production and QA clusters for a few months now and have 
seen better results with table skew and no issues otherwise. Please let me know 
if there are still issues you'd like me to address.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031382#comment-16031382
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


[~tedyu] I added logic to reset values in HBaseConfig before each test is run. 
One thing I noticed is that some tests would set values in the HBase config 
that would carry over to other tests without being run.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Status: Patch Available  (was: Open)

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Attachment: HBASE-17707-14.patch

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Status: Open  (was: Patch Available)

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work stopped] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-17707 stopped by Kahlil Oppenheimer.
--
> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-31 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-17707 started by Kahlil Oppenheimer.
--
> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> HBASE-17707-14.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-30 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Status: Open  (was: Patch Available)

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-30 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Attachment: HBASE-17707-13.patch

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-30 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Status: Patch Available  (was: Open)

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-30 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Attachment: (was: HBASE-17707-13.patch)

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-30 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029941#comment-16029941
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


[~enis] [~tedyu], sorry I had taken a quick break from this, but just got back 
to it. I've uploaded yet another version of the patch. Hopefully, this 
addresses all of your concerns.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-30 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Status: Patch Available  (was: In Progress)

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-30 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-17707 started by Kahlil Oppenheimer.
--
> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-05-30 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Attachment: HBASE-17707-13.patch

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, HBASE-17707-13.patch, 
> test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-23 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938578#comment-15938578
 ] 

Kahlil Oppenheimer edited comment on HBASE-17707 at 3/23/17 3:34 PM:
-

bq. We cannot maintain two different cost functions for table skew. Let's 
remove the old one from the code, and only have the new implementation in this 
patch. We cannot have dead code lying around and rot. We can close HBASE-17706 
as won't fix.
I will add the removal of this old cost function to my patch.

bq. The new candidate generator TableSkewCandidateGenerator is not added to the 
SLB::candidateGenerators field which means that it is not used? I can only see 
the test using it. Is this intended? It has to be enabled by default.
Good catch on the table skew candidate generator. I will also add that to the 
patch as well. I was originally going to do it in a separate patch, but it 
makes much more sense to just do it here.

bq. Did you intend to use the raw variable here instead of calling scale again:
Yup! Let's call R the range [0, 1]. We know that scale() maps values into R. We 
also know that sqrt() maps values from R -> R. Lastly, we know that .9 * r + .1 
for any r in R yields another value in R. So can be sure the outcome is in R. 
No need to call scale function :).

Before opening the patch, I'm just repeatedly running the tests 100s of times 
to feel more confident I haven't missed edge cases since a lot of these test 
failures are very non-deterministic.


was (Author: kahliloppenheimer):
bq. We cannot maintain two different cost functions for table skew. Let's 
remove the old one from the code, and only have the new implementation in this 
patch. We cannot have dead code lying around and rot. We can close HBASE-17706 
as won't fix.
I will add the removal of this old cost function to my patch.

bq. The new candidate generator TableSkewCandidateGenerator is not added to the 
SLB::candidateGenerators field which means that it is not used? I can only see 
the test using it. Is this intended? It has to be enabled by default.
Good catch on the table skew candidate generator. I will also add that to the 
patch as well. I was originally going to do it in a separate patch, but it 
makes much more sense to just do it here.

bq. Did you intend to use the raw variable here instead of calling scale again:
Yup! Let's call R the range [0, 1]. We know that scale() maps values into R. We 
also know that sqrt() maps values from R -> R. Lastly, we know that .9 * r + .1 
for any r in R yields another value in R. So can be sure the outcome is in R. 
No need to call scale function :).



> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions 

[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-23 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938578#comment-15938578
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


bq. We cannot maintain two different cost functions for table skew. Let's 
remove the old one from the code, and only have the new implementation in this 
patch. We cannot have dead code lying around and rot. We can close HBASE-17706 
as won't fix.
I will add the removal of this old cost function to my patch.

bq. The new candidate generator TableSkewCandidateGenerator is not added to the 
SLB::candidateGenerators field which means that it is not used? I can only see 
the test using it. Is this intended? It has to be enabled by default.
Good catch on the table skew candidate generator. I will also add that to the 
patch as well. I was originally going to do it in a separate patch, but it 
makes much more sense to just do it here.

bq. Did you intend to use the raw variable here instead of calling scale again:
Yup! Let's call R the range [0, 1]. We know that scale() maps values into R. We 
also know that sqrt() maps values from R -> R. Lastly, we know that .9 * r + .1 
for any r in R yields another value in R. So can be sure the outcome is in R. 
No need to call scale function :).



> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-22 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Status: Patch Available  (was: Open)

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-22 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Status: Open  (was: Patch Available)

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-22 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Attachment: HBASE-17707-12.patch

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, HBASE-17707-12.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-22 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936530#comment-15936530
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


Sorry, I just realized it was unclear because the unit test was pre-existing 
(before my patch) for the old table skew cost function, but now applies to the 
new one. It is found at TestStochasticLoadBalancer::testTableSkewCost. Also it 
*is* a hard guarantee that {{numMovesPerTable <= pathologicalNumMoves}}. I made 
sure to be consistent with the other cost functions when creating this one.

The issue is that the old table skew cost function was fundamentally broken. It 
did not change its cost estimate as the balancer proposed region moves/swaps, 
meaning the table skew cost it estimated at the beginning of balancing was 
often the same as at the end, which meant it actually played no role in the 
balancing at all. I have a separate issue open HBASE-17706 that fixes the 
behavior in the old TableSkewCostFunction if people would still like to use it. 
But I can't merge that one until this one gets resolved. In any case, it does 
not surprise me that this new cost function would alter behavior because we are 
effectively having table skew considered for the first time in the balancing 
process.

I'll go ahead and rebase/resubmit a new patch that includes the new table skew 
stuff as well as the fix to the region replica host cost function.



> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-19 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931834#comment-15931834
 ] 

Kahlil Oppenheimer edited comment on HBASE-17707 at 3/19/17 5:21 PM:
-

[~enis] [~tedyu] the new table skew cost function is actually guaranteed to be 
within the [0-1] range (and this behavior is even unit tested!). The cost 
function does not dominate over other cost functions because it is out of the 
[0-1] range. Instead, I debugged the breaking test and found that the issue is 
that the region replica host cost function can produce very small values when 
there are a lot of regions. In my testing, I found that for some medium-large 
cluster sizes, the cost function can produce values as small as 2.6 x 10^(-6). 
Sadly, this means that even with a weight of 5000 (which is what is set in the 
test), the "soft" requirement of having no two region replicas hosted on the 
same machine when it is avoidable is not met because the cost function has too 
small a contribution (even with this high weight). Instead, my latest patch 
updates the region replica cost function to give it a minimum value (.1) for 
any amount of co-hosted replicas. This makes it so that if two regions replicas 
are placed on the same host, the cost will be at least .1 (whether there are 5 
or 1,000,000 regions in the cluster). This better enforces the "soft" 
constraint as it makes sure that no other cost functions can overpower the 
region replica host cost function.


was (Author: kahliloppenheimer):
[~enis] [~tedyu] the new table skew cost function is actually guaranteed to be 
within the [0-1] range (and this behavior is even unit tested!). The cost 
function does not dominate over other cost functions because it is out of the 
[0-1] range. Instead, I debugged the breaking test and found that the issue is 
that the region replica host cost function can produce very small values when 
there are a lot of regions. In my testing, I found that for some medium-large 
cluster sizes, the cost function can produce values as small as 2.6 x 10^(-6). 
Sadly, this means that even with a weight of 5000 (which is what is set in the 
test), the "soft" requirement of having no two region replicas hosted on the 
same machine when it is avoidable is not met because the cost function has too 
small a contribution (even with this high weight). Instead, my latest patch 
updates the region replica cost function to give it a minimum value (.1) for 
any amount of co-hosted replicas. This makes it so that if two regions replicas 
are placed on the same host, the cost will be at least .1 (whether or not there 
are 5 or 1,000,000 regions in the cluster). This better enforces the "soft" 
constraint as it makes sure that no other cost functions can overpower the 
region replica host cost function.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region 

[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-19 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931834#comment-15931834
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


[~enis] [~tedyu] the new table skew cost function is actually guaranteed to be 
within the [0-1] range (and this behavior is even unit tested!). The cost 
function does not dominate over other cost functions because it is out of the 
[0-1] range. Instead, I debugged the breaking test and found that the issue is 
that the region replica host cost function can produce very small values when 
there are a lot of regions. In my testing, I found that for some medium-large 
cluster sizes, the cost function can produce values as small as 2.6 x 10^(-6). 
Sadly, this means that even with a weight of 5000 (which is what is set in the 
test), the "soft" requirement of having no two region replicas hosted on the 
same machine when it is avoidable is not met because the cost function has too 
small a contribution (even with this high weight). Instead, my latest patch 
updates the region replica cost function to give it a minimum value (.1) for 
any amount of co-hosted replicas. This makes it so that if two regions replicas 
are placed on the same host, the cost will be at least .1 (whether or not there 
are 5 or 1,000,000 regions in the cluster). This better enforces the "soft" 
constraint as it makes sure that no other cost functions can overpower the 
region replica host cost function.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-17 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930268#comment-15930268
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


I just added a patch that will make the RegionReplicaHostCostFunction return 
higher values even when the cluster is very large (a minimum value of .1 for 
any co-hosted replicas). I believe this solution will scale better because it 
will more logically preserve the constraint that the test is checking for (that 
absolutely no region replicas end up on the same host), even as people add new 
cost functions and such.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-17 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Attachment: HBASE-17707-11.patch

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-17 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Status: Patch Available  (was: Reopened)

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-17 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Attachment: HBASE-17707-11.patch

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, HBASE-17707-11.patch, 
> HBASE-17707-11.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-17 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930137#comment-15930137
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


The issue here is that the tests are checking to make sure no two replicas of 
the same region end up on the same host, but that constraint is only enforced 
by having a very high weight associated with the RegionReplicaHost cost 
function. The issue with this is that even with a very high weight (like 5000), 
the cost value for this function can get really small (like .26) as the 
number of regions grows large. Thus, the balancer might decide to move a 
replica of a region to the same host as another because it benefits other cost 
functions (like table skew) because the RegionReplicaHost cost is so small. I 
have two solutions that would fix this:

1) Disable table skew generator/cost function when there are region replicas.

2) Change the RegionReplicaHost cost function to make the cost super high for 
any amount of replicas on the same host, regardless of how many regions are in 
the cluster.

What are your thoughts?

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17706) TableSkewCostFunction improperly computes max skew

2017-03-16 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17706:
---
Attachment: HBASE-17706-07.patch

> TableSkewCostFunction improperly computes max skew
> --
>
> Key: HBASE-17706
> URL: https://issues.apache.org/jira/browse/HBASE-17706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
>  Labels: patch
> Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, 
> HBASE-17706-03.patch, HBASE-17706-04.patch, HBASE-17706-05.patch, 
> HBASE-17706-06.patch, HBASE-17706-07.patch, HBASE-17706.patch
>
>
> We noticed while running unit tests that the TableSkewCostFunction computed 
> cost did not change as the balancer ran and simulated moves across the 
> cluster. After investigating, we found that this happened in particular when 
> the cluster started out with at least one table very strongly skewed.
> We noticed that the TableSkewCostFunction depends on a field of the 
> BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field 
> is not properly maintained as regionMoves are simulated for the cluster. The 
> field only ever increases as the maximum number of regions per table 
> increases, but it does not decrease as the maximum number per table goes down.
> This patch corrects that behavior so that the field is accurately maintained, 
> and thus the TableSkewCostFunction produces a more correct value as the 
> balancer runs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17706) TableSkewCostFunction improperly computes max skew

2017-03-16 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17706:
---
Status: Patch Available  (was: Open)

> TableSkewCostFunction improperly computes max skew
> --
>
> Key: HBASE-17706
> URL: https://issues.apache.org/jira/browse/HBASE-17706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
>  Labels: patch
> Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, 
> HBASE-17706-03.patch, HBASE-17706-04.patch, HBASE-17706-05.patch, 
> HBASE-17706-06.patch, HBASE-17706-07.patch, HBASE-17706.patch
>
>
> We noticed while running unit tests that the TableSkewCostFunction computed 
> cost did not change as the balancer ran and simulated moves across the 
> cluster. After investigating, we found that this happened in particular when 
> the cluster started out with at least one table very strongly skewed.
> We noticed that the TableSkewCostFunction depends on a field of the 
> BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field 
> is not properly maintained as regionMoves are simulated for the cluster. The 
> field only ever increases as the maximum number of regions per table 
> increases, but it does not decrease as the maximum number per table goes down.
> This patch corrects that behavior so that the field is accurately maintained, 
> and thus the TableSkewCostFunction produces a more correct value as the 
> balancer runs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17706) TableSkewCostFunction improperly computes max skew

2017-03-16 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17706:
---
Status: Open  (was: Patch Available)

> TableSkewCostFunction improperly computes max skew
> --
>
> Key: HBASE-17706
> URL: https://issues.apache.org/jira/browse/HBASE-17706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
>  Labels: patch
> Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, 
> HBASE-17706-03.patch, HBASE-17706-04.patch, HBASE-17706-05.patch, 
> HBASE-17706-06.patch, HBASE-17706-07.patch, HBASE-17706.patch
>
>
> We noticed while running unit tests that the TableSkewCostFunction computed 
> cost did not change as the balancer ran and simulated moves across the 
> cluster. After investigating, we found that this happened in particular when 
> the cluster started out with at least one table very strongly skewed.
> We noticed that the TableSkewCostFunction depends on a field of the 
> BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field 
> is not properly maintained as regionMoves are simulated for the cluster. The 
> field only ever increases as the maximum number of regions per table 
> increases, but it does not decrease as the maximum number per table goes down.
> This patch corrects that behavior so that the field is accurately maintained, 
> and thus the TableSkewCostFunction produces a more correct value as the 
> balancer runs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17706) TableSkewCostFunction improperly computes max skew

2017-03-16 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928158#comment-15928158
 ] 

Kahlil Oppenheimer commented on HBASE-17706:


[~tedyu] I just rebased the patch and resubmitted it. It might break on one 
test in TestStochasticLoadBalancer2 until HBASE-17707 is merged in (which 
contained a fix that affects this test).

> TableSkewCostFunction improperly computes max skew
> --
>
> Key: HBASE-17706
> URL: https://issues.apache.org/jira/browse/HBASE-17706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
>  Labels: patch
> Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, 
> HBASE-17706-03.patch, HBASE-17706-04.patch, HBASE-17706-05.patch, 
> HBASE-17706-06.patch, HBASE-17706.patch
>
>
> We noticed while running unit tests that the TableSkewCostFunction computed 
> cost did not change as the balancer ran and simulated moves across the 
> cluster. After investigating, we found that this happened in particular when 
> the cluster started out with at least one table very strongly skewed.
> We noticed that the TableSkewCostFunction depends on a field of the 
> BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field 
> is not properly maintained as regionMoves are simulated for the cluster. The 
> field only ever increases as the maximum number of regions per table 
> increases, but it does not decrease as the maximum number per table goes down.
> This patch corrects that behavior so that the field is accurately maintained, 
> and thus the TableSkewCostFunction produces a more correct value as the 
> balancer runs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17706) TableSkewCostFunction improperly computes max skew

2017-03-16 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17706:
---
Status: Open  (was: Patch Available)

> TableSkewCostFunction improperly computes max skew
> --
>
> Key: HBASE-17706
> URL: https://issues.apache.org/jira/browse/HBASE-17706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
>  Labels: patch
> Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, 
> HBASE-17706-03.patch, HBASE-17706-04.patch, HBASE-17706-05.patch, 
> HBASE-17706-06.patch, HBASE-17706.patch
>
>
> We noticed while running unit tests that the TableSkewCostFunction computed 
> cost did not change as the balancer ran and simulated moves across the 
> cluster. After investigating, we found that this happened in particular when 
> the cluster started out with at least one table very strongly skewed.
> We noticed that the TableSkewCostFunction depends on a field of the 
> BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field 
> is not properly maintained as regionMoves are simulated for the cluster. The 
> field only ever increases as the maximum number of regions per table 
> increases, but it does not decrease as the maximum number per table goes down.
> This patch corrects that behavior so that the field is accurately maintained, 
> and thus the TableSkewCostFunction produces a more correct value as the 
> balancer runs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17706) TableSkewCostFunction improperly computes max skew

2017-03-16 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17706:
---
Status: Patch Available  (was: Open)

> TableSkewCostFunction improperly computes max skew
> --
>
> Key: HBASE-17706
> URL: https://issues.apache.org/jira/browse/HBASE-17706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
>  Labels: patch
> Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, 
> HBASE-17706-03.patch, HBASE-17706-04.patch, HBASE-17706-05.patch, 
> HBASE-17706-06.patch, HBASE-17706.patch
>
>
> We noticed while running unit tests that the TableSkewCostFunction computed 
> cost did not change as the balancer ran and simulated moves across the 
> cluster. After investigating, we found that this happened in particular when 
> the cluster started out with at least one table very strongly skewed.
> We noticed that the TableSkewCostFunction depends on a field of the 
> BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field 
> is not properly maintained as regionMoves are simulated for the cluster. The 
> field only ever increases as the maximum number of regions per table 
> increases, but it does not decrease as the maximum number per table goes down.
> This patch corrects that behavior so that the field is accurately maintained, 
> and thus the TableSkewCostFunction produces a more correct value as the 
> balancer runs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17706) TableSkewCostFunction improperly computes max skew

2017-03-16 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17706:
---
Attachment: HBASE-17706-06.patch

> TableSkewCostFunction improperly computes max skew
> --
>
> Key: HBASE-17706
> URL: https://issues.apache.org/jira/browse/HBASE-17706
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
>  Labels: patch
> Attachments: HBASE-17706-01.patch, HBASE-17706-02.patch, 
> HBASE-17706-03.patch, HBASE-17706-04.patch, HBASE-17706-05.patch, 
> HBASE-17706-06.patch, HBASE-17706.patch
>
>
> We noticed while running unit tests that the TableSkewCostFunction computed 
> cost did not change as the balancer ran and simulated moves across the 
> cluster. After investigating, we found that this happened in particular when 
> the cluster started out with at least one table very strongly skewed.
> We noticed that the TableSkewCostFunction depends on a field of the 
> BaseLoadBalancer.Cluster class called numMaxRegionsPerTable, but this field 
> is not properly maintained as regionMoves are simulated for the cluster. The 
> field only ever increases as the maximum number of regions per table 
> increases, but it does not decrease as the maximum number per table goes down.
> This patch corrects that behavior so that the field is accurately maintained, 
> and thus the TableSkewCostFunction produces a more correct value as the 
> balancer runs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-15 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926707#comment-15926707
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


[~tedyu]: I just fixed the patch so that it should no longer fail that test. I 
diagnosed the problem as the RegionReplicaHostCostFunction would produce very 
small values as the cluster scales to be large. In clusters with lots of 
regions, this would make the balancer choose plans that put two replicas of the 
same region on the same host. To prevent this from happening, I square-rooted 
the RegionReplicaHostCostFunction to better distribute the values in the range 
0-1 even as the cluster scales up in size.

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-15 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Status: Patch Available  (was: Open)

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-15 Thread Kahlil Oppenheimer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kahlil Oppenheimer updated HBASE-17707:
---
Attachment: HBASE-17707-09.patch

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch, HBASE-17707-09.patch, test-balancer2-13617.out
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-08 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901555#comment-15901555
 ] 

Kahlil Oppenheimer commented on HBASE-17707:


Investigating now

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-17707) New More Accurate Table Skew cost function/generator

2017-03-08 Thread Kahlil Oppenheimer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901555#comment-15901555
 ] 

Kahlil Oppenheimer edited comment on HBASE-17707 at 3/8/17 5:00 PM:


Investigating now. If you'd like, you can rollback until I have a fix ready


was (Author: kahliloppenheimer):
Investigating now

> New More Accurate Table Skew cost function/generator
> 
>
> Key: HBASE-17707
> URL: https://issues.apache.org/jira/browse/HBASE-17707
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Affects Versions: 1.2.0
> Environment: CentOS Derivative with a derivative of the 3.18.43 
> kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches.
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Minor
> Fix For: 2.0
>
> Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, 
> HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, 
> HBASE-17707-05.patch, HBASE-17707-06.patch, HBASE-17707-07.patch, 
> HBASE-17707-08.patch
>
>
> This patch includes new version of the TableSkewCostFunction and a new 
> TableSkewCandidateGenerator.
> The new TableSkewCostFunction computes table skew by counting the minimal 
> number of region moves required for a given table to perfectly balance the 
> table across the cluster (i.e. as if the regions from that table had been 
> round-robin-ed across the cluster). This number of moves is computer for each 
> table, then normalized to a score between 0-1 by dividing by the number of 
> moves required in the absolute worst case (i.e. the entire table is stored on 
> one server), and stored in an array. The cost function then takes a weighted 
> average of the average and maximum value across all tables. The weights in 
> this average are configurable to allow for certain users to more strongly 
> penalize situations where one table is skewed versus where every table is a 
> little bit skewed. To better spread this value more evenly across the range 
> 0-1, we take the square root of the weighted average to get the final value.
> The new TableSkewCandidateGenerator generates region moves/swaps to optimize 
> the above TableSkewCostFunction. It first simply tries to move regions until 
> each server has the right number of regions, then it swaps regions around 
> such that each region swap improves table skew across the cluster.
> We tested the cost function and generator in our production clusters with 
> 100s of TBs of data and 100s of tables across dozens of servers and found 
> both to be very performant and accurate.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >