[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers
[ https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reid Chan updated HBASE-20643: -- Fix Version/s: (was: 1.7.0) 1.8.0 > Getting HDFSBlockDist in Master by querying RegionServers > - > > Key: HBASE-20643 > URL: https://issues.apache.org/jira/browse/HBASE-20643 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Labels: balancer > Fix For: 3.0.0-alpha-1, 1.8.0 > > > Region locality information is needed by the balancer to generate region > plans. Computing HDFSBlockDistribution is expensive on larger clusters and > adds load to the NameNode. This also needs to be recomputed on a master > restart. The proposal is to get the HDFSBlockDistribution from the > RegionServers instead of computing it in Master. RS already has this > information and we could just reuse it by querying it. RS already passes > dataLocality info via RegionLoad today. > Proposed Implementation: This is a high-level overview. > # A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info. Since ClusterStatus > has already become bulky and we don’t need updated locality so fast, it’s > better to have another API rather than add this to RegionLoad and pass it > along with RSReport. > # Master will have a Chore to query all RegionServers and will cache the > HDFSBlockDistribution for those regions. This is easy and quick. Admins can > tune the frequency based on size of the cluster. On a ~90 nodes cluster with > 500k regions and a prototype implementation and no load, it took about 5 > seconds to get all HDFSBlockDistribution from RS. > # The cache will be an extension of RegionLocationFinder (subclass), if > needed to keep the implementation simple. Probably will get clear with > implementation. > # Balancer will use the new cache to get all HDFSBlockDistribution. If there > is a new region and Chore didn’t get the block distribution from RS during > its previous run, then it will be computed by RegionLocationFinder the same > way it has been done now. If the Chore runs more frequently like every hour, > then this recomputation will be drastically reduced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers
[ https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HBASE-20643: -- Fix Version/s: (was: 2.3.0) > Getting HDFSBlockDist in Master by querying RegionServers > - > > Key: HBASE-20643 > URL: https://issues.apache.org/jira/browse/HBASE-20643 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Labels: balancer > Fix For: 3.0.0, 1.7.0 > > > Region locality information is needed by the balancer to generate region > plans. Computing HDFSBlockDistribution is expensive on larger clusters and > adds load to the NameNode. This also needs to be recomputed on a master > restart. The proposal is to get the HDFSBlockDistribution from the > RegionServers instead of computing it in Master. RS already has this > information and we could just reuse it by querying it. RS already passes > dataLocality info via RegionLoad today. > Proposed Implementation: This is a high-level overview. > # A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info. Since ClusterStatus > has already become bulky and we don’t need updated locality so fast, it’s > better to have another API rather than add this to RegionLoad and pass it > along with RSReport. > # Master will have a Chore to query all RegionServers and will cache the > HDFSBlockDistribution for those regions. This is easy and quick. Admins can > tune the frequency based on size of the cluster. On a ~90 nodes cluster with > 500k regions and a prototype implementation and no load, it took about 5 > seconds to get all HDFSBlockDistribution from RS. > # The cache will be an extension of RegionLocationFinder (subclass), if > needed to keep the implementation simple. Probably will get clear with > implementation. > # Balancer will use the new cache to get all HDFSBlockDistribution. If there > is a new region and Chore didn’t get the block distribution from RS during > its previous run, then it will be computed by RegionLocationFinder the same > way it has been done now. If the Chore runs more frequently like every hour, > then this recomputation will be drastically reduced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers
[ https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell updated HBASE-20643: Fix Version/s: (was: 1.5.0) 1.6.0 3.0.0 > Getting HDFSBlockDist in Master by querying RegionServers > - > > Key: HBASE-20643 > URL: https://issues.apache.org/jira/browse/HBASE-20643 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Labels: balancer > Fix For: 3.0.0, 2.3.0, 1.6.0 > > > Region locality information is needed by the balancer to generate region > plans. Computing HDFSBlockDistribution is expensive on larger clusters and > adds load to the NameNode. This also needs to be recomputed on a master > restart. The proposal is to get the HDFSBlockDistribution from the > RegionServers instead of computing it in Master. RS already has this > information and we could just reuse it by querying it. RS already passes > dataLocality info via RegionLoad today. > Proposed Implementation: This is a high-level overview. > # A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info. Since ClusterStatus > has already become bulky and we don’t need updated locality so fast, it’s > better to have another API rather than add this to RegionLoad and pass it > along with RSReport. > # Master will have a Chore to query all RegionServers and will cache the > HDFSBlockDistribution for those regions. This is easy and quick. Admins can > tune the frequency based on size of the cluster. On a ~90 nodes cluster with > 500k regions and a prototype implementation and no load, it took about 5 > seconds to get all HDFSBlockDistribution from RS. > # The cache will be an extension of RegionLocationFinder (subclass), if > needed to keep the implementation simple. Probably will get clear with > implementation. > # Balancer will use the new cache to get all HDFSBlockDistribution. If there > is a new region and Chore didn’t get the block distribution from RS during > its previous run, then it will be computed by RegionLocationFinder the same > way it has been done now. If the Chore runs more frequently like every hour, > then this recomputation will be drastically reduced. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers
[ https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Biju Nair updated HBASE-20643: -- Labels: balancer (was: ) > Getting HDFSBlockDist in Master by querying RegionServers > - > > Key: HBASE-20643 > URL: https://issues.apache.org/jira/browse/HBASE-20643 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Labels: balancer > Fix For: 1.5.0, 2.3.0 > > > Region locality information is needed by the balancer to generate region > plans. Computing HDFSBlockDistribution is expensive on larger clusters and > adds load to the NameNode. This also needs to be recomputed on a master > restart. The proposal is to get the HDFSBlockDistribution from the > RegionServers instead of computing it in Master. RS already has this > information and we could just reuse it by querying it. RS already passes > dataLocality info via RegionLoad today. > Proposed Implementation: This is a high-level overview. > # A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info. Since ClusterStatus > has already become bulky and we don’t need updated locality so fast, it’s > better to have another API rather than add this to RegionLoad and pass it > along with RSReport. > # Master will have a Chore to query all RegionServers and will cache the > HDFSBlockDistribution for those regions. This is easy and quick. Admins can > tune the frequency based on size of the cluster. On a ~90 nodes cluster with > 500k regions and a prototype implementation and no load, it took about 5 > seconds to get all HDFSBlockDistribution from RS. > # The cache will be an extension of RegionLocationFinder (subclass), if > needed to keep the implementation simple. Probably will get clear with > implementation. > # Balancer will use the new cache to get all HDFSBlockDistribution. If there > is a new region and Chore didn’t get the block distribution from RS during > its previous run, then it will be computed by RegionLocationFinder the same > way it has been done now. If the Chore runs more frequently like every hour, > then this recomputation will be drastically reduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers
[ https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-20643: --- Fix Version/s: (was: 1.5.0) 1.5.1 > Getting HDFSBlockDist in Master by querying RegionServers > - > > Key: HBASE-20643 > URL: https://issues.apache.org/jira/browse/HBASE-20643 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 2.3.0, 1.5.1 > > > Region locality information is needed by the balancer to generate region > plans. Computing HDFSBlockDistribution is expensive on larger clusters and > adds load to the NameNode. This also needs to be recomputed on a master > restart. The proposal is to get the HDFSBlockDistribution from the > RegionServers instead of computing it in Master. RS already has this > information and we could just reuse it by querying it. RS already passes > dataLocality info via RegionLoad today. > Proposed Implementation: This is a high-level overview. > # A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info. Since ClusterStatus > has already become bulky and we don’t need updated locality so fast, it’s > better to have another API rather than add this to RegionLoad and pass it > along with RSReport. > # Master will have a Chore to query all RegionServers and will cache the > HDFSBlockDistribution for those regions. This is easy and quick. Admins can > tune the frequency based on size of the cluster. On a ~90 nodes cluster with > 500k regions and a prototype implementation and no load, it took about 5 > seconds to get all HDFSBlockDistribution from RS. > # The cache will be an extension of RegionLocationFinder (subclass), if > needed to keep the implementation simple. Probably will get clear with > implementation. > # Balancer will use the new cache to get all HDFSBlockDistribution. If there > is a new region and Chore didn’t get the block distribution from RS during > its previous run, then it will be computed by RegionLocationFinder the same > way it has been done now. If the Chore runs more frequently like every hour, > then this recomputation will be drastically reduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers
[ https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-20643: --- Fix Version/s: (was: 2.2.0) 2.3.0 > Getting HDFSBlockDist in Master by querying RegionServers > - > > Key: HBASE-20643 > URL: https://issues.apache.org/jira/browse/HBASE-20643 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.3.0 > > > Region locality information is needed by the balancer to generate region > plans. Computing HDFSBlockDistribution is expensive on larger clusters and > adds load to the NameNode. This also needs to be recomputed on a master > restart. The proposal is to get the HDFSBlockDistribution from the > RegionServers instead of computing it in Master. RS already has this > information and we could just reuse it by querying it. RS already passes > dataLocality info via RegionLoad today. > Proposed Implementation: This is a high-level overview. > # A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info. Since ClusterStatus > has already become bulky and we don’t need updated locality so fast, it’s > better to have another API rather than add this to RegionLoad and pass it > along with RSReport. > # Master will have a Chore to query all RegionServers and will cache the > HDFSBlockDistribution for those regions. This is easy and quick. Admins can > tune the frequency based on size of the cluster. On a ~90 nodes cluster with > 500k regions and a prototype implementation and no load, it took about 5 > seconds to get all HDFSBlockDistribution from RS. > # The cache will be an extension of RegionLocationFinder (subclass), if > needed to keep the implementation simple. Probably will get clear with > implementation. > # Balancer will use the new cache to get all HDFSBlockDistribution. If there > is a new region and Chore didn’t get the block distribution from RS during > its previous run, then it will be computed by RegionLocationFinder the same > way it has been done now. If the Chore runs more frequently like every hour, > then this recomputation will be drastically reduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers
[ https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-20643: --- Fix Version/s: (was: 1.4.8) > Getting HDFSBlockDist in Master by querying RegionServers > - > > Key: HBASE-20643 > URL: https://issues.apache.org/jira/browse/HBASE-20643 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.2.0 > > > Region locality information is needed by the balancer to generate region > plans. Computing HDFSBlockDistribution is expensive on larger clusters and > adds load to the NameNode. This also needs to be recomputed on a master > restart. The proposal is to get the HDFSBlockDistribution from the > RegionServers instead of computing it in Master. RS already has this > information and we could just reuse it by querying it. RS already passes > dataLocality info via RegionLoad today. > Proposed Implementation: This is a high-level overview. > # A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info. Since ClusterStatus > has already become bulky and we don’t need updated locality so fast, it’s > better to have another API rather than add this to RegionLoad and pass it > along with RSReport. > # Master will have a Chore to query all RegionServers and will cache the > HDFSBlockDistribution for those regions. This is easy and quick. Admins can > tune the frequency based on size of the cluster. On a ~90 nodes cluster with > 500k regions and a prototype implementation and no load, it took about 5 > seconds to get all HDFSBlockDistribution from RS. > # The cache will be an extension of RegionLocationFinder (subclass), if > needed to keep the implementation simple. Probably will get clear with > implementation. > # Balancer will use the new cache to get all HDFSBlockDistribution. If there > is a new region and Chore didn’t get the block distribution from RS during > its previous run, then it will be computed by RegionLocationFinder the same > way it has been done now. If the Chore runs more frequently like every hour, > then this recomputation will be drastically reduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers
[ https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-20643: --- Fix Version/s: (was: 1.4.7) 1.4.8 > Getting HDFSBlockDist in Master by querying RegionServers > - > > Key: HBASE-20643 > URL: https://issues.apache.org/jira/browse/HBASE-20643 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.2.0, 1.4.8 > > > Region locality information is needed by the balancer to generate region > plans. Computing HDFSBlockDistribution is expensive on larger clusters and > adds load to the NameNode. This also needs to be recomputed on a master > restart. The proposal is to get the HDFSBlockDistribution from the > RegionServers instead of computing it in Master. RS already has this > information and we could just reuse it by querying it. RS already passes > dataLocality info via RegionLoad today. > Proposed Implementation: This is a high-level overview. > # A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info. Since ClusterStatus > has already become bulky and we don’t need updated locality so fast, it’s > better to have another API rather than add this to RegionLoad and pass it > along with RSReport. > # Master will have a Chore to query all RegionServers and will cache the > HDFSBlockDistribution for those regions. This is easy and quick. Admins can > tune the frequency based on size of the cluster. On a ~90 nodes cluster with > 500k regions and a prototype implementation and no load, it took about 5 > seconds to get all HDFSBlockDistribution from RS. > # The cache will be an extension of RegionLocationFinder (subclass), if > needed to keep the implementation simple. Probably will get clear with > implementation. > # Balancer will use the new cache to get all HDFSBlockDistribution. If there > is a new region and Chore didn’t get the block distribution from RS during > its previous run, then it will be computed by RegionLocationFinder the same > way it has been done now. If the Chore runs more frequently like every hour, > then this recomputation will be drastically reduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers
[ https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-20643: --- Fix Version/s: (was: 1.4.6) 1.4.7 > Getting HDFSBlockDist in Master by querying RegionServers > - > > Key: HBASE-20643 > URL: https://issues.apache.org/jira/browse/HBASE-20643 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 2.2.0, 1.4.7 > > > Region locality information is needed by the balancer to generate region > plans. Computing HDFSBlockDistribution is expensive on larger clusters and > adds load to the NameNode. This also needs to be recomputed on a master > restart. The proposal is to get the HDFSBlockDistribution from the > RegionServers instead of computing it in Master. RS already has this > information and we could just reuse it by querying it. RS already passes > dataLocality info via RegionLoad today. > Proposed Implementation: This is a high-level overview. > # A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info. Since ClusterStatus > has already become bulky and we don’t need updated locality so fast, it’s > better to have another API rather than add this to RegionLoad and pass it > along with RSReport. > # Master will have a Chore to query all RegionServers and will cache the > HDFSBlockDistribution for those regions. This is easy and quick. Admins can > tune the frequency based on size of the cluster. On a ~90 nodes cluster with > 500k regions and a prototype implementation and no load, it took about 5 > seconds to get all HDFSBlockDistribution from RS. > # The cache will be an extension of RegionLocationFinder (subclass), if > needed to keep the implementation simple. Probably will get clear with > implementation. > # Balancer will use the new cache to get all HDFSBlockDistribution. If there > is a new region and Chore didn’t get the block distribution from RS during > its previous run, then it will be computed by RegionLocationFinder the same > way it has been done now. If the Chore runs more frequently like every hour, > then this recomputation will be drastically reduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers
[ https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-20643: -- Fix Version/s: (was: 2.1.0) 2.2.0 > Getting HDFSBlockDist in Master by querying RegionServers > - > > Key: HBASE-20643 > URL: https://issues.apache.org/jira/browse/HBASE-20643 > Project: HBase > Issue Type: Improvement > Components: Balancer >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan >Priority: Major > Fix For: 1.5.0, 1.4.6, 2.2.0 > > > Region locality information is needed by the balancer to generate region > plans. Computing HDFSBlockDistribution is expensive on larger clusters and > adds load to the NameNode. This also needs to be recomputed on a master > restart. The proposal is to get the HDFSBlockDistribution from the > RegionServers instead of computing it in Master. RS already has this > information and we could just reuse it by querying it. RS already passes > dataLocality info via RegionLoad today. > Proposed Implementation: This is a high-level overview. > # A RegionServer API has to be added which will return HDFSBlockDistribution > for all the regions it hosts. RS already has this info. Since ClusterStatus > has already become bulky and we don’t need updated locality so fast, it’s > better to have another API rather than add this to RegionLoad and pass it > along with RSReport. > # Master will have a Chore to query all RegionServers and will cache the > HDFSBlockDistribution for those regions. This is easy and quick. Admins can > tune the frequency based on size of the cluster. On a ~90 nodes cluster with > 500k regions and a prototype implementation and no load, it took about 5 > seconds to get all HDFSBlockDistribution from RS. > # The cache will be an extension of RegionLocationFinder (subclass), if > needed to keep the implementation simple. Probably will get clear with > implementation. > # Balancer will use the new cache to get all HDFSBlockDistribution. If there > is a new region and Chore didn’t get the block distribution from RS during > its previous run, then it will be computed by RegionLocationFinder the same > way it has been done now. If the Chore runs more frequently like every hour, > then this recomputation will be drastically reduced. -- This message was sent by Atlassian JIRA (v7.6.3#76005)