[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers

2021-04-04 Thread Reid Chan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reid Chan updated HBASE-20643:
--
Fix Version/s: (was: 1.7.0)
   1.8.0

> Getting HDFSBlockDist in Master by querying RegionServers
> -
>
> Key: HBASE-20643
> URL: https://issues.apache.org/jira/browse/HBASE-20643
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
>  Labels: balancer
> Fix For: 3.0.0-alpha-1, 1.8.0
>
>
> Region locality information is needed by the balancer to generate region 
> plans. Computing HDFSBlockDistribution is expensive on larger clusters and 
> adds load to the NameNode. This also needs to be recomputed on a master 
> restart. The proposal is to get the HDFSBlockDistribution from the 
> RegionServers instead of computing it in Master. RS already has this 
> information and we could just reuse it by querying it. RS already passes 
> dataLocality info via RegionLoad today.
> Proposed Implementation: This is a high-level overview.
> # A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info. Since ClusterStatus 
> has already become bulky and we don’t need updated locality so fast, it’s 
> better to have another API rather than add this to RegionLoad and pass it 
> along with RSReport.
> # Master will have a Chore to query all RegionServers and will cache the 
> HDFSBlockDistribution for those regions. This is easy and quick. Admins can 
> tune the frequency based on size of the cluster. On a ~90 nodes cluster with 
> 500k regions and a prototype implementation and no load, it took about 5 
> seconds to get all HDFSBlockDistribution from RS.
> # The cache will be an extension of RegionLocationFinder (subclass), if 
> needed to keep the implementation simple. Probably will get clear with 
> implementation.
> # Balancer will use the new cache to get all HDFSBlockDistribution. If there 
> is a new region and Chore didn’t get the block distribution from RS during 
> its previous run, then it will be computed by RegionLocationFinder the same 
> way it has been done now. If the Chore runs more frequently like every hour, 
> then this recomputation will be drastically reduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers

2020-01-23 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HBASE-20643:
--
Fix Version/s: (was: 2.3.0)

> Getting HDFSBlockDist in Master by querying RegionServers
> -
>
> Key: HBASE-20643
> URL: https://issues.apache.org/jira/browse/HBASE-20643
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
>  Labels: balancer
> Fix For: 3.0.0, 1.7.0
>
>
> Region locality information is needed by the balancer to generate region 
> plans. Computing HDFSBlockDistribution is expensive on larger clusters and 
> adds load to the NameNode. This also needs to be recomputed on a master 
> restart. The proposal is to get the HDFSBlockDistribution from the 
> RegionServers instead of computing it in Master. RS already has this 
> information and we could just reuse it by querying it. RS already passes 
> dataLocality info via RegionLoad today.
> Proposed Implementation: This is a high-level overview.
> # A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info. Since ClusterStatus 
> has already become bulky and we don’t need updated locality so fast, it’s 
> better to have another API rather than add this to RegionLoad and pass it 
> along with RSReport.
> # Master will have a Chore to query all RegionServers and will cache the 
> HDFSBlockDistribution for those regions. This is easy and quick. Admins can 
> tune the frequency based on size of the cluster. On a ~90 nodes cluster with 
> 500k regions and a prototype implementation and no load, it took about 5 
> seconds to get all HDFSBlockDistribution from RS.
> # The cache will be an extension of RegionLocationFinder (subclass), if 
> needed to keep the implementation simple. Probably will get clear with 
> implementation.
> # Balancer will use the new cache to get all HDFSBlockDistribution. If there 
> is a new region and Chore didn’t get the block distribution from RS during 
> its previous run, then it will be computed by RegionLocationFinder the same 
> way it has been done now. If the Chore runs more frequently like every hour, 
> then this recomputation will be drastically reduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers

2019-10-07 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-20643:

Fix Version/s: (was: 1.5.0)
   1.6.0
   3.0.0

> Getting HDFSBlockDist in Master by querying RegionServers
> -
>
> Key: HBASE-20643
> URL: https://issues.apache.org/jira/browse/HBASE-20643
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
>  Labels: balancer
> Fix For: 3.0.0, 2.3.0, 1.6.0
>
>
> Region locality information is needed by the balancer to generate region 
> plans. Computing HDFSBlockDistribution is expensive on larger clusters and 
> adds load to the NameNode. This also needs to be recomputed on a master 
> restart. The proposal is to get the HDFSBlockDistribution from the 
> RegionServers instead of computing it in Master. RS already has this 
> information and we could just reuse it by querying it. RS already passes 
> dataLocality info via RegionLoad today.
> Proposed Implementation: This is a high-level overview.
> # A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info. Since ClusterStatus 
> has already become bulky and we don’t need updated locality so fast, it’s 
> better to have another API rather than add this to RegionLoad and pass it 
> along with RSReport.
> # Master will have a Chore to query all RegionServers and will cache the 
> HDFSBlockDistribution for those regions. This is easy and quick. Admins can 
> tune the frequency based on size of the cluster. On a ~90 nodes cluster with 
> 500k regions and a prototype implementation and no load, it took about 5 
> seconds to get all HDFSBlockDistribution from RS.
> # The cache will be an extension of RegionLocationFinder (subclass), if 
> needed to keep the implementation simple. Probably will get clear with 
> implementation.
> # Balancer will use the new cache to get all HDFSBlockDistribution. If there 
> is a new region and Chore didn’t get the block distribution from RS during 
> its previous run, then it will be computed by RegionLocationFinder the same 
> way it has been done now. If the Chore runs more frequently like every hour, 
> then this recomputation will be drastically reduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers

2019-04-22 Thread Biju Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biju Nair updated HBASE-20643:
--
Labels: balancer  (was: )

> Getting HDFSBlockDist in Master by querying RegionServers
> -
>
> Key: HBASE-20643
> URL: https://issues.apache.org/jira/browse/HBASE-20643
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
>  Labels: balancer
> Fix For: 1.5.0, 2.3.0
>
>
> Region locality information is needed by the balancer to generate region 
> plans. Computing HDFSBlockDistribution is expensive on larger clusters and 
> adds load to the NameNode. This also needs to be recomputed on a master 
> restart. The proposal is to get the HDFSBlockDistribution from the 
> RegionServers instead of computing it in Master. RS already has this 
> information and we could just reuse it by querying it. RS already passes 
> dataLocality info via RegionLoad today.
> Proposed Implementation: This is a high-level overview.
> # A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info. Since ClusterStatus 
> has already become bulky and we don’t need updated locality so fast, it’s 
> better to have another API rather than add this to RegionLoad and pass it 
> along with RSReport.
> # Master will have a Chore to query all RegionServers and will cache the 
> HDFSBlockDistribution for those regions. This is easy and quick. Admins can 
> tune the frequency based on size of the cluster. On a ~90 nodes cluster with 
> 500k regions and a prototype implementation and no load, it took about 5 
> seconds to get all HDFSBlockDistribution from RS.
> # The cache will be an extension of RegionLocationFinder (subclass), if 
> needed to keep the implementation simple. Probably will get clear with 
> implementation.
> # Balancer will use the new cache to get all HDFSBlockDistribution. If there 
> is a new region and Chore didn’t get the block distribution from RS during 
> its previous run, then it will be computed by RegionLocationFinder the same 
> way it has been done now. If the Chore runs more frequently like every hour, 
> then this recomputation will be drastically reduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers

2019-02-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20643:
---
Fix Version/s: (was: 1.5.0)
   1.5.1

> Getting HDFSBlockDist in Master by querying RegionServers
> -
>
> Key: HBASE-20643
> URL: https://issues.apache.org/jira/browse/HBASE-20643
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 2.3.0, 1.5.1
>
>
> Region locality information is needed by the balancer to generate region 
> plans. Computing HDFSBlockDistribution is expensive on larger clusters and 
> adds load to the NameNode. This also needs to be recomputed on a master 
> restart. The proposal is to get the HDFSBlockDistribution from the 
> RegionServers instead of computing it in Master. RS already has this 
> information and we could just reuse it by querying it. RS already passes 
> dataLocality info via RegionLoad today.
> Proposed Implementation: This is a high-level overview.
> # A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info. Since ClusterStatus 
> has already become bulky and we don’t need updated locality so fast, it’s 
> better to have another API rather than add this to RegionLoad and pass it 
> along with RSReport.
> # Master will have a Chore to query all RegionServers and will cache the 
> HDFSBlockDistribution for those regions. This is easy and quick. Admins can 
> tune the frequency based on size of the cluster. On a ~90 nodes cluster with 
> 500k regions and a prototype implementation and no load, it took about 5 
> seconds to get all HDFSBlockDistribution from RS.
> # The cache will be an extension of RegionLocationFinder (subclass), if 
> needed to keep the implementation simple. Probably will get clear with 
> implementation.
> # Balancer will use the new cache to get all HDFSBlockDistribution. If there 
> is a new region and Chore didn’t get the block distribution from RS during 
> its previous run, then it will be computed by RegionLocationFinder the same 
> way it has been done now. If the Chore runs more frequently like every hour, 
> then this recomputation will be drastically reduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers

2019-01-31 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-20643:
---
Fix Version/s: (was: 2.2.0)
   2.3.0

> Getting HDFSBlockDist in Master by querying RegionServers
> -
>
> Key: HBASE-20643
> URL: https://issues.apache.org/jira/browse/HBASE-20643
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.3.0
>
>
> Region locality information is needed by the balancer to generate region 
> plans. Computing HDFSBlockDistribution is expensive on larger clusters and 
> adds load to the NameNode. This also needs to be recomputed on a master 
> restart. The proposal is to get the HDFSBlockDistribution from the 
> RegionServers instead of computing it in Master. RS already has this 
> information and we could just reuse it by querying it. RS already passes 
> dataLocality info via RegionLoad today.
> Proposed Implementation: This is a high-level overview.
> # A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info. Since ClusterStatus 
> has already become bulky and we don’t need updated locality so fast, it’s 
> better to have another API rather than add this to RegionLoad and pass it 
> along with RSReport.
> # Master will have a Chore to query all RegionServers and will cache the 
> HDFSBlockDistribution for those regions. This is easy and quick. Admins can 
> tune the frequency based on size of the cluster. On a ~90 nodes cluster with 
> 500k regions and a prototype implementation and no load, it took about 5 
> seconds to get all HDFSBlockDistribution from RS.
> # The cache will be an extension of RegionLocationFinder (subclass), if 
> needed to keep the implementation simple. Probably will get clear with 
> implementation.
> # Balancer will use the new cache to get all HDFSBlockDistribution. If there 
> is a new region and Chore didn’t get the block distribution from RS during 
> its previous run, then it will be computed by RegionLocationFinder the same 
> way it has been done now. If the Chore runs more frequently like every hour, 
> then this recomputation will be drastically reduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20643:
---
Fix Version/s: (was: 1.4.8)

> Getting HDFSBlockDist in Master by querying RegionServers
> -
>
> Key: HBASE-20643
> URL: https://issues.apache.org/jira/browse/HBASE-20643
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.2.0
>
>
> Region locality information is needed by the balancer to generate region 
> plans. Computing HDFSBlockDistribution is expensive on larger clusters and 
> adds load to the NameNode. This also needs to be recomputed on a master 
> restart. The proposal is to get the HDFSBlockDistribution from the 
> RegionServers instead of computing it in Master. RS already has this 
> information and we could just reuse it by querying it. RS already passes 
> dataLocality info via RegionLoad today.
> Proposed Implementation: This is a high-level overview.
> # A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info. Since ClusterStatus 
> has already become bulky and we don’t need updated locality so fast, it’s 
> better to have another API rather than add this to RegionLoad and pass it 
> along with RSReport.
> # Master will have a Chore to query all RegionServers and will cache the 
> HDFSBlockDistribution for those regions. This is easy and quick. Admins can 
> tune the frequency based on size of the cluster. On a ~90 nodes cluster with 
> 500k regions and a prototype implementation and no load, it took about 5 
> seconds to get all HDFSBlockDistribution from RS.
> # The cache will be an extension of RegionLocationFinder (subclass), if 
> needed to keep the implementation simple. Probably will get clear with 
> implementation.
> # Balancer will use the new cache to get all HDFSBlockDistribution. If there 
> is a new region and Chore didn’t get the block distribution from RS during 
> its previous run, then it will be computed by RegionLocationFinder the same 
> way it has been done now. If the Chore runs more frequently like every hour, 
> then this recomputation will be drastically reduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers

2018-08-21 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20643:
---
Fix Version/s: (was: 1.4.7)
   1.4.8

> Getting HDFSBlockDist in Master by querying RegionServers
> -
>
> Key: HBASE-20643
> URL: https://issues.apache.org/jira/browse/HBASE-20643
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.2.0, 1.4.8
>
>
> Region locality information is needed by the balancer to generate region 
> plans. Computing HDFSBlockDistribution is expensive on larger clusters and 
> adds load to the NameNode. This also needs to be recomputed on a master 
> restart. The proposal is to get the HDFSBlockDistribution from the 
> RegionServers instead of computing it in Master. RS already has this 
> information and we could just reuse it by querying it. RS already passes 
> dataLocality info via RegionLoad today.
> Proposed Implementation: This is a high-level overview.
> # A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info. Since ClusterStatus 
> has already become bulky and we don’t need updated locality so fast, it’s 
> better to have another API rather than add this to RegionLoad and pass it 
> along with RSReport.
> # Master will have a Chore to query all RegionServers and will cache the 
> HDFSBlockDistribution for those regions. This is easy and quick. Admins can 
> tune the frequency based on size of the cluster. On a ~90 nodes cluster with 
> 500k regions and a prototype implementation and no load, it took about 5 
> seconds to get all HDFSBlockDistribution from RS.
> # The cache will be an extension of RegionLocationFinder (subclass), if 
> needed to keep the implementation simple. Probably will get clear with 
> implementation.
> # Balancer will use the new cache to get all HDFSBlockDistribution. If there 
> is a new region and Chore didn’t get the block distribution from RS during 
> its previous run, then it will be computed by RegionLocationFinder the same 
> way it has been done now. If the Chore runs more frequently like every hour, 
> then this recomputation will be drastically reduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers

2018-07-19 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20643:
---
Fix Version/s: (was: 1.4.6)
   1.4.7

> Getting HDFSBlockDist in Master by querying RegionServers
> -
>
> Key: HBASE-20643
> URL: https://issues.apache.org/jira/browse/HBASE-20643
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.2.0, 1.4.7
>
>
> Region locality information is needed by the balancer to generate region 
> plans. Computing HDFSBlockDistribution is expensive on larger clusters and 
> adds load to the NameNode. This also needs to be recomputed on a master 
> restart. The proposal is to get the HDFSBlockDistribution from the 
> RegionServers instead of computing it in Master. RS already has this 
> information and we could just reuse it by querying it. RS already passes 
> dataLocality info via RegionLoad today.
> Proposed Implementation: This is a high-level overview.
> # A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info. Since ClusterStatus 
> has already become bulky and we don’t need updated locality so fast, it’s 
> better to have another API rather than add this to RegionLoad and pass it 
> along with RSReport.
> # Master will have a Chore to query all RegionServers and will cache the 
> HDFSBlockDistribution for those regions. This is easy and quick. Admins can 
> tune the frequency based on size of the cluster. On a ~90 nodes cluster with 
> 500k regions and a prototype implementation and no load, it took about 5 
> seconds to get all HDFSBlockDistribution from RS.
> # The cache will be an extension of RegionLocationFinder (subclass), if 
> needed to keep the implementation simple. Probably will get clear with 
> implementation.
> # Balancer will use the new cache to get all HDFSBlockDistribution. If there 
> is a new region and Chore didn’t get the block distribution from RS during 
> its previous run, then it will be computed by RegionLocationFinder the same 
> way it has been done now. If the Chore runs more frequently like every hour, 
> then this recomputation will be drastically reduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers

2018-06-19 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-20643:
--
Fix Version/s: (was: 2.1.0)
   2.2.0

> Getting HDFSBlockDist in Master by querying RegionServers
> -
>
> Key: HBASE-20643
> URL: https://issues.apache.org/jira/browse/HBASE-20643
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 1.4.6, 2.2.0
>
>
> Region locality information is needed by the balancer to generate region 
> plans. Computing HDFSBlockDistribution is expensive on larger clusters and 
> adds load to the NameNode. This also needs to be recomputed on a master 
> restart. The proposal is to get the HDFSBlockDistribution from the 
> RegionServers instead of computing it in Master. RS already has this 
> information and we could just reuse it by querying it. RS already passes 
> dataLocality info via RegionLoad today.
> Proposed Implementation: This is a high-level overview.
> # A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info. Since ClusterStatus 
> has already become bulky and we don’t need updated locality so fast, it’s 
> better to have another API rather than add this to RegionLoad and pass it 
> along with RSReport.
> # Master will have a Chore to query all RegionServers and will cache the 
> HDFSBlockDistribution for those regions. This is easy and quick. Admins can 
> tune the frequency based on size of the cluster. On a ~90 nodes cluster with 
> 500k regions and a prototype implementation and no load, it took about 5 
> seconds to get all HDFSBlockDistribution from RS.
> # The cache will be an extension of RegionLocationFinder (subclass), if 
> needed to keep the implementation simple. Probably will get clear with 
> implementation.
> # Balancer will use the new cache to get all HDFSBlockDistribution. If there 
> is a new region and Chore didn’t get the block distribution from RS during 
> its previous run, then it will be computed by RegionLocationFinder the same 
> way it has been done now. If the Chore runs more frequently like every hour, 
> then this recomputation will be drastically reduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)