[jira] [Comment Edited] (HBASE-21706) Inconsistency of fs.defaultFS between active and standby masters
[ https://issues.apache.org/jira/browse/HBASE-21706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741442#comment-16741442 ] Lei Chen edited comment on HBASE-21706 at 1/13/19 2:18 AM: --- I don't have a running apache hbase cluster right now, but I do see same code in apache hbase 1.1.2 - [HRegionServer.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L565] - [MasterFileSystem.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L121] I believe apache master branch also has the same code. After failover, the new active master finds the right fs ( "hdfs://DEV-CLUSTER" ), but the new standby (previous active) master will join the rest and have it set to "hdfs://DEV-CLUSTER/hbase-root" I would like to provide a patch if this is indeed an unexpected behavior, but could anyone please help me identify some cases where `fs.defaultFS` from standby masters might be used? Thanks was (Author: leochen4891): I don't have a running apache hbase cluster right now, but I do see same code in apache hbase 1.1.2 - [HRegionServer.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L565] - [MasterFileSystem.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L121]I believe apache master branch also has the same code. After failover, the new active master finds the right fs ( "hdfs://DEV-CLUSTER" ), but the new standby (previous active) master will join the rest and have it set to "hdfs://DEV-CLUSTER/hbase-root" I would like to provide a patch if this is indeed an unexpected behavior, but could anyone please help me identify some cases where `fs.defaultFS` from standby masters might be used? Thanks > Inconsistency of fs.defaultFS between active and standby masters > > > Key: HBASE-21706 > URL: https://issues.apache.org/jira/browse/HBASE-21706 > Project: HBase > Issue Type: Bug > Components: conf, master >Affects Versions: 1.1.2 >Reporter: Lei Chen >Priority: Minor > > I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and > standby masters have different `fs.defaultFS` on their /conf pages. > Given `fs.defaultFS` is set to : and `hbase.rootdir` is > set to :/ in core-site.xml on all the hosts, it > looks like standby masters has `fs.defaultFS` programatically set to the same > value as `hbase.rootdir`. > For example, on a 3 heads cluster DEV-CLUSTER, my active master has the > following line on the /conf page > {code:java} > fs.defaultFShdfs://DEV-CLUSTERprogramatically > {code} > but standby masters has > {code:java} > fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code} > Please correct me if this is not a bug but a feature, however I find this > behavior surprising plus I cannot locate any related document. > From a quick look at the code, the cause seems to be that standby masters got > the property set in > [HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652], > and active master got it set in a different way in > [MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-21706) Inconsistency of fs.defaultFS between active and standby masters
[ https://issues.apache.org/jira/browse/HBASE-21706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741442#comment-16741442 ] Lei Chen edited comment on HBASE-21706 at 1/13/19 2:17 AM: --- I don't have a running apache hbase cluster right now, but I do see same code in apache hbase 1.1.2 - [HRegionServer.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L565] - [MasterFileSystem.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L121]I believe apache master branch also has the same code. After failover, the new active master finds the right fs ( "hdfs://DEV-CLUSTER" ), but the new standby (previous active) master will join the rest and have it set to "hdfs://DEV-CLUSTER/hbase-root" I would like to provide a patch if this is indeed an unexpected behavior, but could anyone please help me identify some cases where `fs.defaultFS` from standby masters might be used? Thanks was (Author: leochen4891): I don't have a running apache hbase cluster right now, but I do see same code in apache hbase 1.1.2 - [HRegionServer.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L565] - [MasterFileSystem.java |https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L121]I believe apache master branch also has the same code. After failover, the new active master finds the right fs ( "hdfs://DEV-CLUSTER" ), but the new standby (previous active) master will join the rest and have it set to "hdfs://DEV-CLUSTER/hbase-root" I would like to provide a patch if this is indeed an unexpected behavior, but could anyone please help me identify some cases where `fs.defaultFS` from standby masters might be used? Thanks > Inconsistency of fs.defaultFS between active and standby masters > > > Key: HBASE-21706 > URL: https://issues.apache.org/jira/browse/HBASE-21706 > Project: HBase > Issue Type: Bug > Components: conf, master >Affects Versions: 1.1.2 >Reporter: Lei Chen >Priority: Minor > > I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and > standby masters have different `fs.defaultFS` on their /conf pages. > Given `fs.defaultFS` is set to : and `hbase.rootdir` is > set to :/ in core-site.xml on all the hosts, it > looks like standby masters has `fs.defaultFS` programatically set to the same > value as `hbase.rootdir`. > For example, on a 3 heads cluster DEV-CLUSTER, my active master has the > following line on the /conf page > {code:java} > fs.defaultFShdfs://DEV-CLUSTERprogramatically > {code} > but standby masters has > {code:java} > fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code} > Please correct me if this is not a bug but a feature, however I find this > behavior surprising plus I cannot locate any related document. > From a quick look at the code, the cause seems to be that standby masters got > the property set in > [HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652], > and active master got it set in a different way in > [MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-21706) Inconsistency of fs.defaultFS between active and standby masters
[ https://issues.apache.org/jira/browse/HBASE-21706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741442#comment-16741442 ] Lei Chen commented on HBASE-21706: -- I don't have a running apache hbase cluster right now, but I do see same code in apache hbase 1.1.2 - [HRegionServer.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L565] - [MasterFileSystem.java |https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L121]I believe apache master branch also has the same code. After failover, the new active master finds the right fs ( "hdfs://DEV-CLUSTER" ), but the new standby (previous active) master will join the rest and have it set to "hdfs://DEV-CLUSTER/hbase-root" I would like to provide a patch if this is indeed an unexpected behavior, but could anyone please help me identify some cases where `fs.defaultFS` from standby masters might be used? Thanks > Inconsistency of fs.defaultFS between active and standby masters > > > Key: HBASE-21706 > URL: https://issues.apache.org/jira/browse/HBASE-21706 > Project: HBase > Issue Type: Bug > Components: conf, master >Affects Versions: 1.1.2 >Reporter: Lei Chen >Priority: Minor > > I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and > standby masters have different `fs.defaultFS` on their /conf pages. > Given `fs.defaultFS` is set to : and `hbase.rootdir` is > set to :/ in core-site.xml on all the hosts, it > looks like standby masters has `fs.defaultFS` programatically set to the same > value as `hbase.rootdir`. > For example, on a 3 heads cluster DEV-CLUSTER, my active master has the > following line on the /conf page > {code:java} > fs.defaultFShdfs://DEV-CLUSTERprogramatically > {code} > but standby masters has > {code:java} > fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code} > Please correct me if this is not a bug but a feature, however I find this > behavior surprising plus I cannot locate any related document. > From a quick look at the code, the cause seems to be that standby masters got > the property set in > [HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652], > and active master got it set in a different way in > [MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21706) Inconsistency of fs.defaultFS between active and standby masters
[ https://issues.apache.org/jira/browse/HBASE-21706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-21706: - Description: I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and standby masters have different `fs.defaultFS` on their /conf pages. Given `fs.defaultFS` is set to : and `hbase.rootdir` is set to :/ in core-site.xml on all the hosts, it looks like standby masters has `fs.defaultFS` programatically set to the same value as `hbase.rootdir`. For example, on a 3 heads cluster DEV-CLUSTER, my active master has the following line on the /conf page {code:java} fs.defaultFShdfs://DEV-CLUSTERprogramatically {code} but standby masters has {code:java} fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code} Please correct me if this is not a bug but a feature, however I find this behavior surprising plus I cannot locate any related document. >From a quick looking at the code, the cause seems to be that standby masters >got the property set in >[HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652], > and active master got it set in a different way in >[MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137]. was: I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and standby masters have different `fs.defaultFS` on their /conf pages. Given `fs.defaultFS` is set to : and `hbase.rootdir` is set to :/ in core-site.xml on all the hosts, it looks like standby masters has `fs.defaultFS` programatically set to the same value as `hbase.rootdir`. For example, on a 3 heads cluster DEV-CLUSTER, my active master has the following line on the /conf page {code:java} fs.defaultFShdfs://DEV-CLUSTERprogramatically {code} but standby masters has {code:java} fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code} Please correct me if this is not a bug but a feature, but I find this behavior surprising plus I cannot locate any related document. >From a quick looking at the code, the cause seems to be that standby masters >got the property set in >[HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652], > and active master got it set in a different way in >[MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137]. > Inconsistency of fs.defaultFS between active and standby masters > > > Key: HBASE-21706 > URL: https://issues.apache.org/jira/browse/HBASE-21706 > Project: HBase > Issue Type: Bug > Components: conf, master >Affects Versions: 1.1.2 >Reporter: Lei Chen >Priority: Minor > > I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and > standby masters have different `fs.defaultFS` on their /conf pages. > Given `fs.defaultFS` is set to : and `hbase.rootdir` is > set to :/ in core-site.xml on all the hosts, it > looks like standby masters has `fs.defaultFS` programatically set to the same > value as `hbase.rootdir`. > For example, on a 3 heads cluster DEV-CLUSTER, my active master has the > following line on the /conf page > {code:java} > fs.defaultFShdfs://DEV-CLUSTERprogramatically > {code} > but standby masters has > {code:java} > fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code} > Please correct me if this is not a bug but a feature, however I find this > behavior surprising plus I cannot locate any related document. > From a quick looking at the code, the cause seems to be that standby masters > got the property set in > [HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652], > and active master got it set in a different way in > [MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21706) Inconsistency of fs.defaultFS between active and standby masters
[ https://issues.apache.org/jira/browse/HBASE-21706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-21706: - Description: I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and standby masters have different `fs.defaultFS` on their /conf pages. Given `fs.defaultFS` is set to : and `hbase.rootdir` is set to :/ in core-site.xml on all the hosts, it looks like standby masters has `fs.defaultFS` programatically set to the same value as `hbase.rootdir`. For example, on a 3 heads cluster DEV-CLUSTER, my active master has the following line on the /conf page {code:java} fs.defaultFShdfs://DEV-CLUSTERprogramatically {code} but standby masters has {code:java} fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code} Please correct me if this is not a bug but a feature, however I find this behavior surprising plus I cannot locate any related document. >From a quick look at the code, the cause seems to be that standby masters got >the property set in >[HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652], > and active master got it set in a different way in >[MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137]. was: I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and standby masters have different `fs.defaultFS` on their /conf pages. Given `fs.defaultFS` is set to : and `hbase.rootdir` is set to :/ in core-site.xml on all the hosts, it looks like standby masters has `fs.defaultFS` programatically set to the same value as `hbase.rootdir`. For example, on a 3 heads cluster DEV-CLUSTER, my active master has the following line on the /conf page {code:java} fs.defaultFShdfs://DEV-CLUSTERprogramatically {code} but standby masters has {code:java} fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code} Please correct me if this is not a bug but a feature, however I find this behavior surprising plus I cannot locate any related document. >From a quick looking at the code, the cause seems to be that standby masters >got the property set in >[HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652], > and active master got it set in a different way in >[MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137]. > Inconsistency of fs.defaultFS between active and standby masters > > > Key: HBASE-21706 > URL: https://issues.apache.org/jira/browse/HBASE-21706 > Project: HBase > Issue Type: Bug > Components: conf, master >Affects Versions: 1.1.2 >Reporter: Lei Chen >Priority: Minor > > I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and > standby masters have different `fs.defaultFS` on their /conf pages. > Given `fs.defaultFS` is set to : and `hbase.rootdir` is > set to :/ in core-site.xml on all the hosts, it > looks like standby masters has `fs.defaultFS` programatically set to the same > value as `hbase.rootdir`. > For example, on a 3 heads cluster DEV-CLUSTER, my active master has the > following line on the /conf page > {code:java} > fs.defaultFShdfs://DEV-CLUSTERprogramatically > {code} > but standby masters has > {code:java} > fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code} > Please correct me if this is not a bug but a feature, however I find this > behavior surprising plus I cannot locate any related document. > From a quick look at the code, the cause seems to be that standby masters got > the property set in > [HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652], > and active master got it set in a different way in > [MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-21706) Inconsistency of fs.defaultFS between active and standby masters
[ https://issues.apache.org/jira/browse/HBASE-21706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-21706: - Description: I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and standby masters have different `fs.defaultFS` on their /conf pages. Given `fs.defaultFS` is set to : and `hbase.rootdir` is set to :/ in core-site.xml on all the hosts, it looks like standby masters has `fs.defaultFS` programatically set to the same value as `hbase.rootdir`. For example, on a 3 heads cluster DEV-CLUSTER, my active master has the following line on the /conf page {code:java} fs.defaultFShdfs://DEV-CLUSTERprogramatically {code} but standby masters has {code:java} fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code} Please correct me if this is not a bug but a feature, but I find this behavior surprising plus I cannot locate any related document. >From a quick looking at the code, the cause seems to be that standby masters >got the property set in >[HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652], > and active master got it set in a different way in >[MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137]. was: I'm using HDP-2.3.6.22-1 with HBase HA configured. I noticed that active and standby masters have different `fs.defaultFS` on their /conf pages. Given `fs.defaultFS` is set to : and `hbase.rootdir` is set to :/ in core-site.xml on all the hosts, it looks like standby masters has `fs.defaultFS` programatically set to the same value as `hbase.rootdir`. For example, on a 3 heads cluster DEV-CLUSTER, my active master has the following line on the /conf page {code:java} fs.defaultFShdfs://DEV-CLUSTERprogramatically {code} but standby masters has {code:java} fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code} Please correct me if this is not a bug but a feature, but I find this behavior surprising plus I cannot locate any related document. >From a quick looking at the code, the cause seems to be that standby masters >got the property set in >[HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652], > and active master got it set in a different way in >[MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137]. > Inconsistency of fs.defaultFS between active and standby masters > > > Key: HBASE-21706 > URL: https://issues.apache.org/jira/browse/HBASE-21706 > Project: HBase > Issue Type: Bug > Components: conf, master >Affects Versions: 1.1.2 >Reporter: Lei Chen >Priority: Minor > > I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and > standby masters have different `fs.defaultFS` on their /conf pages. > Given `fs.defaultFS` is set to : and `hbase.rootdir` is > set to :/ in core-site.xml on all the hosts, it > looks like standby masters has `fs.defaultFS` programatically set to the same > value as `hbase.rootdir`. > For example, on a 3 heads cluster DEV-CLUSTER, my active master has the > following line on the /conf page > {code:java} > fs.defaultFShdfs://DEV-CLUSTERprogramatically > {code} > but standby masters has > {code:java} > fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code} > Please correct me if this is not a bug but a feature, but I find this > behavior surprising plus I cannot locate any related document. > From a quick looking at the code, the cause seems to be that standby masters > got the property set in > [HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652], > and active master got it set in a different way in > [MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21706) Inconsistency of fs.defaultFS between active and standby masters
Lei Chen created HBASE-21706: Summary: Inconsistency of fs.defaultFS between active and standby masters Key: HBASE-21706 URL: https://issues.apache.org/jira/browse/HBASE-21706 Project: HBase Issue Type: Bug Components: conf, master Affects Versions: 1.1.2 Reporter: Lei Chen I'm using HDP-2.3.6.22-1 with HBase HA configured. I noticed that active and standby masters have different `fs.defaultFS` on their /conf pages. Given `fs.defaultFS` is set to : and `hbase.rootdir` is set to :/ in core-site.xml on all the hosts, it looks like standby masters has `fs.defaultFS` programatically set to the same value as `hbase.rootdir`. For example, on a 3 heads cluster DEV-CLUSTER, my active master has the following line on the /conf page {code:java} fs.defaultFShdfs://DEV-CLUSTERprogramatically {code} but standby masters has {code:java} fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code} Please correct me if this is not a bug but a feature, but I find this behavior surprising plus I cannot locate any related document. >From a quick looking at the code, the cause seems to be that standby masters >got the property set in >[HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652], > and active master got it set in a different way in >[MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-16423) Add re-compare option to VerifyReplication to avoid occasional inconsistent rows
[ https://issues.apache.org/jira/browse/HBASE-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639905#comment-16639905 ] Lei Chen commented on HBASE-16423: -- I'm facing the false positive inconsistency problem you described here. Having the thread sleep and compare again some time later looks like a good way to reduce noises, but may not be a guaranteed way to report inconsistency. As long as the ingestion is running, it is possible at the time of re-comparing, the target row of source and replication have matched and diverged again. A more sophisticated method may be required if user needs 100% confidence. > Add re-compare option to VerifyReplication to avoid occasional inconsistent > rows > > > Key: HBASE-16423 > URL: https://issues.apache.org/jira/browse/HBASE-16423 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui >Priority: Minor > Fix For: 1.4.0, 2.0.0 > > Attachments: HBASE-16423-branch-1-v1.patch, HBASE-16423-v1.patch, > HBASE-16423-v2.patch, HBASE-16423-v3.patch > > > Because replication keeps eventually consistency, VerifyReplication may > report inconsistent rows if there are data being written to source or peer > clusters during scanning. These occasionally inconsistent rows will have the > same data if we do the comparison again after a short period. It is not easy > to find the really inconsistent rows if VerifyReplication report a large > number of such occasionally inconsistency. To avoid this case, we can add an > option to make VerifyReplication read out the inconsistent rows again after > sleeping a few seconds and re-compare the rows during scanning. This behavior > follows the eventually consistency of hbase's replication. Suggestions and > discussions are welcomed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-16423) Add re-compare option to VerifyReplication to avoid occasional inconsistent rows
[ https://issues.apache.org/jira/browse/HBASE-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639905#comment-16639905 ] Lei Chen edited comment on HBASE-16423 at 10/5/18 2:50 PM: --- I'm facing the false positive inconsistency problem you described here as well. Having the thread sleep and compare again some time later looks like a good way to reduce noises, but may not be a guaranteed way to report inconsistency. As long as the ingestion is running, it is possible at the time of re-comparing, the target row of source and replication have matched and diverged again. A more sophisticated method may be required if user needs 100% confidence. was (Author: leochen4891): I'm facing the false positive inconsistency problem you described here. Having the thread sleep and compare again some time later looks like a good way to reduce noises, but may not be a guaranteed way to report inconsistency. As long as the ingestion is running, it is possible at the time of re-comparing, the target row of source and replication have matched and diverged again. A more sophisticated method may be required if user needs 100% confidence. > Add re-compare option to VerifyReplication to avoid occasional inconsistent > rows > > > Key: HBASE-16423 > URL: https://issues.apache.org/jira/browse/HBASE-16423 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 2.0.0 >Reporter: Jianwei Cui >Assignee: Jianwei Cui >Priority: Minor > Fix For: 1.4.0, 2.0.0 > > Attachments: HBASE-16423-branch-1-v1.patch, HBASE-16423-v1.patch, > HBASE-16423-v2.patch, HBASE-16423-v3.patch > > > Because replication keeps eventually consistency, VerifyReplication may > report inconsistent rows if there are data being written to source or peer > clusters during scanning. These occasionally inconsistent rows will have the > same data if we do the comparison again after a short period. It is not easy > to find the really inconsistent rows if VerifyReplication report a large > number of such occasionally inconsistency. To avoid this case, we can add an > option to make VerifyReplication read out the inconsistent rows again after > sleeping a few seconds and re-compare the rows during scanning. This behavior > follows the eventually consistency of hbase's replication. Suggestions and > discussions are welcomed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down
[ https://issues.apache.org/jira/browse/HBASE-18005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006621#comment-16006621 ] Lei Chen edited comment on HBASE-18005 at 5/11/17 5:18 PM: --- Thanks for the explanation and update. Yes, there is a gap between the primary meta region and its replica, defined by hbase.regionserver.meta.storefile.refresh.period, plus there is no notification mechanism at present, setting the hbase.meta.replica.count to 2 or 3 is indeed not a complete solution but an improve. Meanwhile, there is also a gap between the primary meta region and the one cached on the client side. The difference between the two gaps is how the gap is closed. The first one refreshes with a fixed interval while the second one updates when a miss is encountered. Please correct me if I'm wrong, the worst case I can imagine is 1. The locations of a primary region p1 and its replica r1 have changed. 2. The primary meta updates but its replica is not, due to the fixed interval 3. The region server that serves both primaries goes down 4. A client has not updated its meta cache after p1 and r1 was relocated, and now makes a get request to p1 In this case, neither the client cache nor the meta replica can provide the correct location of the target regions. That being said, I agree with you that the cached location of the replicas is still worth trying, and should be pardoned from clearing the meta cache, as you have proposed in the patch. was (Author: leochen4891): Thanks for the explanation and update. Yes, there is a gap between the primary meta region and its replica, defined by hbase.regionserver.meta.storefile.refresh.period, plus there is no notification mechanism at present, setting the hbase.meta.replica.count to 2 or 3 is indeed not a complete solution but an improve. Meanwhile, there is also a gap between the primary meta region and the one cached on the client side. The difference between the two gaps is how the gap is closed. The first one refreshes with a fixed interval while the second one updates when a miss is encountered. Please correct me if I'm wrong, the worst case I can imagine is 1. The locations of a primary region p1 and its replica r1 have changed. 2. The primary meta updates but its replica is not, due to the fixed interval 3. The region server that serves both primaries goes down 4. A client has not updated its meta cache after p1 and r1 was relocated, and now makes a get request to p1 That being said, I agree with you that the cached location of the replicas is still worth trying, and should be pardoned from clearing the meta cache, as you have proposed in the patch. > read replica: handle the case that region server hosting both primary replica > and meta region is down > - > > Key: HBASE-18005 > URL: https://issues.apache.org/jira/browse/HBASE-18005 > Project: HBase > Issue Type: Bug >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18005-master-001.patch > > > Identified one corner case in testing that when the region server hosting > both primary replica and the meta region is down, the client tries to reload > the primary replica location from meta table, it is supposed to clean up only > the cached location for specific replicaId, but it clears caches for all > replicas. Please see > https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L813 > Since it takes some time for regions to be reassigned (including meta > region), the following may throw exception > https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L173 > This exception needs to be caught and it needs to get cached location (in > this case, the primary replica's location is not available). If there are > cached locations for other replicas, it can still go ahead to get stale > values from secondary replicas. > With meta replica, it still helps to not clean up the caches for all replicas > as the info from primary meta replica is up-to-date. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down
[ https://issues.apache.org/jira/browse/HBASE-18005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006621#comment-16006621 ] Lei Chen edited comment on HBASE-18005 at 5/11/17 3:18 PM: --- Thanks for the explanation and update. Yes, there is a gap between the primary meta region and its replica, defined by hbase.regionserver.meta.storefile.refresh.period, plus there is no notification mechanism at present, setting the hbase.meta.replica.count to 2 or 3 is indeed not a complete solution but an improve. Meanwhile, there is also a gap between the primary meta region and the one cached on the client side. The difference between the two gaps is how the gap is closed. The first one refreshes with a fixed interval while the second one updates when a miss is encountered. Please correct me if I'm wrong, the worst case I can imagine is 1. The locations of a primary region p1 and its replica r1 have changed. 2. The primary meta updates but its replica is not, due to the fixed interval 3. The region server that serves both primaries goes down 4. A client has not updated its meta cache after p1 and r1 was relocated, and now makes a get request to p1 That being said, I agree with you that the cached location of the replicas is still worth trying, and should be pardoned from clearing the meta cache, as you have proposed in the patch. was (Author: leochen4891): Thanks for the explanation and update. Yes, there is a gap between the primary meta region and its replica, defined by hbase.regionserver.meta.storefile.refresh.period, plus there is no notification mechanism at present, setting the hbase.meta.replica.count to 2 or 3 is indeed not a complete solution but an improve. Meanwhile, there is also a gap between the primary meta region and the one cached on the client side. The difference between the two gaps is how the gap is closed. The first one refreshes with a fixed interval while the second one updates see a miss. Please correct me if I'm wrong, the worst case I can imagine is 1. The locations of a primary region p1 and its replica r1 have changed. 2. The primary meta updates but its replica is not, due to the fixed interval 3. The region server that serves both primaries goes down 4. A client has not updated its meta cache after p1 and r1 was relocated, and now makes a get request to p1 That being said, I agree with you that the cached location of the replicas is still worth trying, and should be pardoned from clearing the meta cache, as you have proposed in the patch. > read replica: handle the case that region server hosting both primary replica > and meta region is down > - > > Key: HBASE-18005 > URL: https://issues.apache.org/jira/browse/HBASE-18005 > Project: HBase > Issue Type: Bug >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18005-master-001.patch > > > Identified one corner case in testing that when the region server hosting > both primary replica and the meta region is down, the client tries to reload > the primary replica location from meta table, it is supposed to clean up only > the cached location for specific replicaId, but it clears caches for all > replicas. Please see > https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L813 > Since it takes some time for regions to be reassigned (including meta > region), the following may throw exception > https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L173 > This exception needs to be caught and it needs to get cached location (in > this case, the primary replica's location is not available). If there are > cached locations for other replicas, it can still go ahead to get stale > values from secondary replicas. > With meta replica, it still helps to not clean up the caches for all replicas > as the info from primary meta replica is up-to-date. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down
[ https://issues.apache.org/jira/browse/HBASE-18005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006621#comment-16006621 ] Lei Chen commented on HBASE-18005: -- Thanks for the explanation and update. Yes, there is a gap between the primary meta region and its replica, defined by hbase.regionserver.meta.storefile.refresh.period, plus there is no notification mechanism at present, setting the hbase.meta.replica.count to 2 or 3 is indeed not a complete solution but an improve. Meanwhile, there is also a gap between the primary meta region and the one cached on the client side. The difference between the two gaps is how the gap is closed. The first one refreshes with a fixed interval while the second one updates see a miss. Please correct me if I'm wrong, the worst case I can imagine is 1. The locations of a primary region p1 and its replica r1 have changed. 2. The primary meta updates but its replica is not, due to the fixed interval 3. The region server that serves both primaries goes down 4. A client has not updated its meta cache after p1 and r1 was relocated, and now makes a get request to p1 That being said, I agree with you that the cached location of the replicas is still worth trying, and should be pardoned from clearing the meta cache, as you have proposed in the patch. > read replica: handle the case that region server hosting both primary replica > and meta region is down > - > > Key: HBASE-18005 > URL: https://issues.apache.org/jira/browse/HBASE-18005 > Project: HBase > Issue Type: Bug >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18005-master-001.patch > > > Identified one corner case in testing that when the region server hosting > both primary replica and the meta region is down, the client tries to reload > the primary replica location from meta table, it is supposed to clean up only > the cached location for specific replicaId, but it clears caches for all > replicas. Please see > https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L813 > Since it takes some time for regions to be reassigned (including meta > region), the following may throw exception > https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L173 > This exception needs to be caught and it needs to get cached location (in > this case, the primary replica's location is not available). If there are > cached locations for other replicas, it can still go ahead to get stale > values from secondary replicas. > With meta replica, it still helps to not clean up the caches for all replicas > as the info from primary meta replica is up-to-date. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down
[ https://issues.apache.org/jira/browse/HBASE-18005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005805#comment-16005805 ] Lei Chen commented on HBASE-18005: -- Hi Huaxiang, I'm running into this issue as well. Good work! May I ask which version of hbase are you using? The reason I have the question is because I'm using HBase 1.1.2 which doesn't have the fix for an known meta table replication issue (https://issues.apache.org/jira/browse/HBASE-17238) With HBase-17238 in place, setting hbase.meta.replica.count to a number greater than 1 should be able to handle the case where the primary regions of a normal table and the meta table are both down. I'm curious if you are in the same situation as me that cannot have hbase.meta.replica.count set to, say 3? (https://hbase.apache.org/book.html#_server_side_properties) > read replica: handle the case that region server hosting both primary replica > and meta region is down > - > > Key: HBASE-18005 > URL: https://issues.apache.org/jira/browse/HBASE-18005 > Project: HBase > Issue Type: Bug >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-18005-master-001.patch > > > Identified one corner case in testing that when the region server hosting > both primary replica and the meta region is down, the client tries to reload > the primary replica location from meta table, it is supposed to clean up only > the cached location for specific replicaId, but it clears caches for all > replicas. Please see > https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L813 > Since it takes some time for regions to be reassigned (including meta > region), the following may throw exception > https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L173 > This exception needs to be caught and it needs to get cached location (in > this case, the primary replica's location is not available). If there are > cached locations for other replicas, it can still go ahead to get stale > values from secondary replicas. > With meta replica, it still helps to not clean up the caches for all replicas > as the info from primary meta replica is up-to-date. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791565#comment-14791565 ] Lei Chen commented on HBASE-14082: -- Thank you all for helping me all the way. > Add replica id to JMX metrics names > --- > > Key: HBASE-14082 > URL: https://issues.apache.org/jira/browse/HBASE-14082 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Lei Chen >Assignee: Lei Chen > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: 14082-v6.patch, HBASE-14082-v1.patch, > HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, > HBASE-14082-v5.patch > > > Today, via JMX, one cannot distinguish a primary region from a replica. A > possible solution is to add replica id to JMX metrics names. The benefits may > include, for example: > # Knowing the latency of a read request on a replica region means the first > attempt to the primary region has timeout. > # Write requests on replicas are due to the replication process, while the > ones on primary are from clients. > # In case of looking for hot spots of read operations, replicas should be > excluded since TIMELINE reads are sent to all replicas. > To implement, we can change the format of metrics names found at > {code}Hadoop->HBase->RegionServer->Regions->Attributes{code} > from > {code}namespace__table__region__metric_{code} > to > {code}namespace__table__region__replicaid__metric_{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739838#comment-14739838 ] Lei Chen commented on HBASE-14082: -- thanks for rebasing the patch > Add replica id to JMX metrics names > --- > > Key: HBASE-14082 > URL: https://issues.apache.org/jira/browse/HBASE-14082 > Project: HBase > Issue Type: Improvement > Components: metrics >Reporter: Lei Chen >Assignee: Lei Chen > Fix For: 2.0.0 > > Attachments: 14082-v6.patch, HBASE-14082-v1.patch, > HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, > HBASE-14082-v5.patch > > > Today, via JMX, one cannot distinguish a primary region from a replica. A > possible solution is to add replica id to JMX metrics names. The benefits may > include, for example: > # Knowing the latency of a read request on a replica region means the first > attempt to the primary region has timeout. > # Write requests on replicas are due to the replication process, while the > ones on primary are from clients. > # In case of looking for hot spots of read operations, replicas should be > excluded since TIMELINE reads are sent to all replicas. > To implement, we can change the format of metrics names found at > {code}Hadoop->HBase->RegionServer->Regions->Attributes{code} > from > {code}namespace__table__region__metric_{code} > to > {code}namespace__table__region__replicaid__metric_{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14293) TestStochasticBalancerJmxMetrics intermittently fails due to port conflict
[ https://issues.apache.org/jira/browse/HBASE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708504#comment-14708504 ] Lei Chen commented on HBASE-14293: -- Forgot the link to the function: [here|http://svn.apache.org/viewvc/camel/trunk/components/camel-test/src/main/java/org/apache/camel/test/AvailablePortFinder.java?view=markup#l130] TestStochasticBalancerJmxMetrics intermittently fails due to port conflict -- Key: HBASE-14293 URL: https://issues.apache.org/jira/browse/HBASE-14293 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 2.0.0 Attachments: 14293-v1.txt, 14293-v2.txt From https://builds.apache.org/job/HBase-TRUNK/6748/testReport/junit/org.apache.hadoop.hbase/TestStochasticBalancerJmxMetrics/testJmxMetrics_EnsembleMode/ : {code} 2015-08-22 20:46:07,939 ERROR [M:0;asf900:59022] coprocessor.CoprocessorHost(518): The coprocessor org.apache.hadoop.hbase.JMXListener threw java.rmi.server.ExportException: Port already in use: 61120; nested exception is: java.net.BindException: Address already in use java.rmi.server.ExportException: Port already in use: 61120; nested exception is: java.net.BindException: Address already in use at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:329) at sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:237) at sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411) ... 2015-08-22 20:49:41,755 DEBUG [main] hbase.TestStochasticBalancerJmxMetrics(93): Encountered exception when starting cluster. Trying port 61122 java.lang.IllegalStateException: A mini-cluster is already running at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:981) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:872) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:866) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:810) at org.apache.hadoop.hbase.TestStochasticBalancerJmxMetrics.setupBeforeClass(TestStochasticBalancerJmxMetrics.java:89) {code} When port conflict is detected, we try the next port. However, HTU#miniClusterRunning is true by this moment, leading to the second exception shown above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14293) TestStochasticBalancerJmxMetrics intermittently fails due to port conflict
[ https://issues.apache.org/jira/browse/HBASE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708502#comment-14708502 ] Lei Chen commented on HBASE-14293: -- For patch v1, it looks good. I'm also thinking if shutting down and restart miniCluster is somehow costly, it might be better if we can pick an available port before starting the miniCluster? Using a function like this Maybe randomly choose a port, and test it. If it is available, start the miniCluster, otherwise try 10 more other random ports. What do you think? TestStochasticBalancerJmxMetrics intermittently fails due to port conflict -- Key: HBASE-14293 URL: https://issues.apache.org/jira/browse/HBASE-14293 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 2.0.0 Attachments: 14293-v1.txt, 14293-v2.txt From https://builds.apache.org/job/HBase-TRUNK/6748/testReport/junit/org.apache.hadoop.hbase/TestStochasticBalancerJmxMetrics/testJmxMetrics_EnsembleMode/ : {code} 2015-08-22 20:46:07,939 ERROR [M:0;asf900:59022] coprocessor.CoprocessorHost(518): The coprocessor org.apache.hadoop.hbase.JMXListener threw java.rmi.server.ExportException: Port already in use: 61120; nested exception is: java.net.BindException: Address already in use java.rmi.server.ExportException: Port already in use: 61120; nested exception is: java.net.BindException: Address already in use at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:329) at sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:237) at sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411) ... 2015-08-22 20:49:41,755 DEBUG [main] hbase.TestStochasticBalancerJmxMetrics(93): Encountered exception when starting cluster. Trying port 61122 java.lang.IllegalStateException: A mini-cluster is already running at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:981) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:872) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:866) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:810) at org.apache.hadoop.hbase.TestStochasticBalancerJmxMetrics.setupBeforeClass(TestStochasticBalancerJmxMetrics.java:89) {code} When port conflict is detected, we try the next port. However, HTU#miniClusterRunning is true by this moment, leading to the second exception shown above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14293) TestStochasticBalancerJmxMetrics intermittently fails due to port conflict
[ https://issues.apache.org/jira/browse/HBASE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708520#comment-14708520 ] Lei Chen commented on HBASE-14293: -- For patch v2: 1. It seems the do-while loop will infinitely try to obtain an available port. If there exists some extreme condition (e.g. network interface problems?) will the loop stuck or JUnit will trigger a timeout? 2. nit, if the for loop is to try different ports, after using the port-finding function, is it still needed? TestStochasticBalancerJmxMetrics intermittently fails due to port conflict -- Key: HBASE-14293 URL: https://issues.apache.org/jira/browse/HBASE-14293 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 2.0.0 Attachments: 14293-v1.txt, 14293-v2.txt From https://builds.apache.org/job/HBase-TRUNK/6748/testReport/junit/org.apache.hadoop.hbase/TestStochasticBalancerJmxMetrics/testJmxMetrics_EnsembleMode/ : {code} 2015-08-22 20:46:07,939 ERROR [M:0;asf900:59022] coprocessor.CoprocessorHost(518): The coprocessor org.apache.hadoop.hbase.JMXListener threw java.rmi.server.ExportException: Port already in use: 61120; nested exception is: java.net.BindException: Address already in use java.rmi.server.ExportException: Port already in use: 61120; nested exception is: java.net.BindException: Address already in use at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:329) at sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:237) at sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411) ... 2015-08-22 20:49:41,755 DEBUG [main] hbase.TestStochasticBalancerJmxMetrics(93): Encountered exception when starting cluster. Trying port 61122 java.lang.IllegalStateException: A mini-cluster is already running at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:981) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:872) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:866) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:810) at org.apache.hadoop.hbase.TestStochasticBalancerJmxMetrics.setupBeforeClass(TestStochasticBalancerJmxMetrics.java:89) {code} When port conflict is detected, we try the next port. However, HTU#miniClusterRunning is true by this moment, leading to the second exception shown above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14293) TestStochasticBalancerJmxMetrics intermittently fails due to port conflict
[ https://issues.apache.org/jira/browse/HBASE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708706#comment-14708706 ] Lei Chen commented on HBASE-14293: -- patch v3 looks good. +1 TestStochasticBalancerJmxMetrics intermittently fails due to port conflict -- Key: HBASE-14293 URL: https://issues.apache.org/jira/browse/HBASE-14293 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Fix For: 2.0.0 Attachments: 14293-v1.txt, 14293-v2.txt, 14293-v3.txt From https://builds.apache.org/job/HBase-TRUNK/6748/testReport/junit/org.apache.hadoop.hbase/TestStochasticBalancerJmxMetrics/testJmxMetrics_EnsembleMode/ : {code} 2015-08-22 20:46:07,939 ERROR [M:0;asf900:59022] coprocessor.CoprocessorHost(518): The coprocessor org.apache.hadoop.hbase.JMXListener threw java.rmi.server.ExportException: Port already in use: 61120; nested exception is: java.net.BindException: Address already in use java.rmi.server.ExportException: Port already in use: 61120; nested exception is: java.net.BindException: Address already in use at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:329) at sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:237) at sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411) ... 2015-08-22 20:49:41,755 DEBUG [main] hbase.TestStochasticBalancerJmxMetrics(93): Encountered exception when starting cluster. Trying port 61122 java.lang.IllegalStateException: A mini-cluster is already running at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:981) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:872) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:866) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:810) at org.apache.hadoop.hbase.TestStochasticBalancerJmxMetrics.setupBeforeClass(TestStochasticBalancerJmxMetrics.java:89) {code} When port conflict is detected, we try the next port. However, HTU#miniClusterRunning is true by this moment, leading to the second exception shown above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-14082: - Attachment: HBASE-14082-v5.patch Updates: 1. Javadoc fix Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, HBASE-14082-v5.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-14082: - Attachment: HBASE-14082-v4.patch Updates: 1. Javadoc and wrapping long lines Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-14082: - Attachment: HBASE-14082-v3.patch Updates: 1. moved replica id from metrics name to a separate metric. 2. updated related test Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch, HBASE-14082-v3.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-14082: - Status: Patch Available (was: Open) Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch, HBASE-14082-v3.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694425#comment-14694425 ] Lei Chen commented on HBASE-14082: -- Thanks for the suggestion, I will upload a patch for the 1.x soon Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-branch-1.patch Updates: 1. Included 13965-addendum which tries 5 different ports for JMX connection 2. Fix balancer already exists error in TestAssignmentManager. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0 Attachments: 13965-addendum.txt, HBASE-13965-branch-1.patch, HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-branch-1-v2.patch Updates: 1. wrapped a long line ( 100) The failed test from last patch seems not related. Here is the log: testWalRollOnLowReplication(org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS) Time elapsed: 3.804 sec ERROR! java.lang.RuntimeException: sync aborted at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:491) at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.insert(WALProcedureStore.java:334) at org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS.testWalRollOnLowReplication(TestWALProcedureStoreOnHDFS.java:189) Caused by: org.apache.hadoop.ipc.RemoteException: File /test-logs/state-0006.log could only be replicated to 2 nodes instead of minReplication (=3). There are 3 datanode(s) running and 3 node(s) are excluded in this operation. at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1471) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2791) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:606) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:455) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) at org.apache.hadoop.ipc.Client.call(Client.java:1411) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy20.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy20.addBlock(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:368) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1449) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1270) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:526) Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0 Attachments: 13965-addendum.txt, HBASE-13965-branch-1-v2.patch, HBASE-13965-branch-1.patch, HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-v10.patch Updates: 1. Spelling and formatting 2. LOG level changed to error when failed to get size of all tables. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Status: Patch Available (was: Open) Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652765#comment-14652765 ] Lei Chen commented on HBASE-13965: -- +1 Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652247#comment-14652247 ] Lei Chen commented on HBASE-14082: -- Would it be simpler if we put the replica_id also in the Regions instead of creating a new MBean? The replica id can be queried using wildcard matching, without the need of searching in the name to replica_id map. e.g. {code} Regions: { namespace_default_table_foo_region_aaabbb_metric_mutateCount: 100, namespace_default_table_foo_region_aaabbb_metric_replicaid: 0, namespace_default_table_foo_region_bbbccc_metric_mutateCount: 100, namespace_default_table_foo_region_bbbccc_metric_replicaid: 1, } {code} Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652314#comment-14652314 ] Lei Chen commented on HBASE-13965: -- thanks, I will update soon Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-v11.patch Updates: 1. License added for {{hbase-hadoop2-compat/src/main/resources/x/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource}} Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645177#comment-14645177 ] Lei Chen commented on HBASE-14082: -- I noticed that there is a function called isDefaultReplica(int replicaid) in RegionReplicaUtil.java Does default replica means primary region, and I should use this function instead of checking replicaid 0 ? Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-14082: - Attachment: HBASE-14082-v2.patch Updates: 1. When replica id 0, use the following format {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} 2. Test case updated. Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-v9.patch Updates: 1. added a null pointer check in getMetrics( ) Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642069#comment-14642069 ] Lei Chen commented on HBASE-13965: -- I will attach a summary and a patch soon. Thanks for reminding. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBase-13965-JConsole.png Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639875#comment-14639875 ] Lei Chen commented on HBASE-14082: -- If an existing parser uses exact-string matching or order-sensitive key-value matching, it may not be backwards compatible. For example, If we have 2 regions, one primary and one secondary {{namespace_default_table_sales_region__metric_storeCount}} {{namespace_default_table_sales_region__metric_storeCount}} becomes {{namespace_default_table_sales_region__metric_storeCount}} {{namespace_default_table_sales_region__replicaid_1_metric_storeCount}} Since I'm not familiar with any existing parser, please correct me if I made any false assumption. Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639877#comment-14639877 ] Lei Chen commented on HBASE-13965: -- I've been trying to apply the patch on a test hbase cluster for verifying. Currently experiencing some compatibility issues. Will update status once got results. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632087#comment-14632087 ] Lei Chen commented on HBASE-14082: -- Changing the metrics names has compatibility issue. Can we add one metric for each region and has its replica id as the value? For example: {{namespace_default_table_sales_region_00254ff636363bd577b9a66de7cc4bbf_metric_storeCount}} {{namespace_default_table_sales_region_00254ff636363bd577b9a66de7cc4bbf_metric_storeFileCount}} {{namespace_default_table_sales_region_00254ff636363bd577b9a66de7cc4bbf_metric_memStoreSize}} {{{color:red}namespace_default_table_sales_region_00254ff636363bd577b9a66de7cc4bbf_metric_replicaid{color}}} Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-14082: - Attachment: HBASE-14082-v1.patch Updates: 1. Added getReplicaId() to MetricsRegionWrapper 2. Inserted replicaid and value pair to MetricsRegionSourceImpl 3. Updated test case TestMetricsRegion Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-14082: - Status: Patch Available (was: Open) Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628738#comment-14628738 ] Lei Chen commented on HBASE-14082: -- Thanks for pointing out the compatibility issue. I will propose a change shortly. Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-v8.patch Updates: 1. Use the number of all tables (including system tables) to calculate the size of the MRU map. This should be fine since we are trying to avoid OOM, not necessarily calculate the exact number of metrics needed. 2. formatting and spelling improvements TODO: 1. The unit test uses 61120 as the JMX registry port. I noticed that in one of the recent QA test results, it reports a Port already in use error. Should I change the port? 2. The last two patches failed the core tests. However I'm not sure that the failed test, TestWALProcedureStoreOnHDFS.testWalRollOnLowReplication, is related to this patch. 3. About removing the per-table mode entirely, I'm not sure it should be included in this JIRA. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14082) Add replica id to JMX metrics names
Lei Chen created HBASE-14082: Summary: Add replica id to JMX metrics names Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627396#comment-14627396 ] Lei Chen commented on HBASE-14082: -- If the changes are # adding a getter, getReplicaId(), to MetricsRegionWrapper.java # inserting a string, _replicaid_ + regionWrapper.getReplicaId(), to MetricsRegionSourceImpl.java Should I include a test case? I'm not sure if it is preferred that every change should be covered by unit tests. Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624668#comment-14624668 ] Lei Chen commented on HBASE-13965: -- Do you mean removing all the code for the per-table balancing, as well as documents if any? If the balancing is always performed on the ensemble table, then no table name is needed for the metrics name in JMX, right? Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624048#comment-14624048 ] Lei Chen commented on HBASE-13965: -- Thanks for pointing it out. I had some test and found that there are two system tables, hbase:meta and hbase:namespace. The first one, hbase:meta, will not be given to balancer while hbase:namespace will. Maybe I can just get the number of all tables, and -1 for removing hbase:meta? That will avoid the looping. {code}tablesCount = services.getTableDescriptors().getAll().size(); // -1 for removing a system table, hbase:meta, which will not be balanced. tablesCount--; {code} However, this is based on an assumption that only one non-balancing system table exists. Is there any better solution? Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14623933#comment-14623933 ] Lei Chen commented on HBASE-13965: -- Thanks for the feedback. {Should we do early return if the table is system table ?} Should filtering tables happen before/outside updateStochasticCosts? (e.g in HMaster?) Since HMaster sends hbase:namespace for balancing, there might be a reason for that, and I feel it may help to include its costs in JMX, just in case. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622705#comment-14622705 ] Lei Chen commented on HBASE-13965: -- Ted, Clay, and stack, thanks for offering suggestions. Currently, in the ensemble mode, the combined table is named ensemble, I somehow feel that it might be a good idea to use the name directly. For example, ensemble_Overall or ensemble_costFunction1. In a similar way, when in the per-table mode, the system table hbase:namespace can be reported to JMX using it's name directly as well. The benefit is that we can avoid any special case handling in the balancer. The balancer just calculate and report whatever tables given by hmaster. I'm updating the unit test to cover both modes. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622782#comment-14622782 ] Lei Chen commented on HBASE-13965: -- I think the ensemble table is more like a temporary table, and it is not in any namespace. https://github.com/apache/hbase/blob/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java#L792 Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-v7.patch Updates: 1. Overloaded balanceCluster() to pass the table name to balancer 2. Moved some string constants to HConstants.java 3. Stochastic balancer auto adjust JMX metrics size by the number of tables 4. Stochastic handles both ensemble and per-table modes. 5. Updated tests to cover both modes. TODO: 1. The tests currently only use the miniCluster to save and read JMX metrics, which means that the tables are not actually stored in hbase. I'm NOT sure if this method is adequate or we need to save real tables to miniCluster and balance them for real? Sorry guys, I still cannot upload the patch file to review board. The diff file always gets No valid separator after the filename was found in the diff header error. If I manually touch up the file by adding (revision ) or (working copy), I got revision cannot be found error. The command line rbt has the same problem. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622900#comment-14622900 ] Lei Chen commented on HBASE-13965: -- Thanks for clarifying. I will use hbase:ensemble for this cute table-like thing - smile. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621641#comment-14621641 ] Lei Chen commented on HBASE-13965: -- I also found that when {{hbase.master.loadbalancer.bytable}} is set to true, balancing will also be performed on table hbase:namespace, which is a system table. Should the costs of hbase:namespace be reported to JMX the same way as user tables? Any idea? Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621457#comment-14621457 ] Lei Chen commented on HBASE-13965: -- I have found a problem related to HBASE-5231(per-table load balancing). It seems that the balancing is done by an iteration of tables. https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1219-L1228 This can be configured to be in per-table mode or ensemble mode. https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java#L956-L962 In ensemble mode, all the tables are copied into an ensemble table for balancing. The configuration is set by {{hbase.master.loadbalancer.bytable}} My question is how to name the metrics if the balancing is in ensemble mode. For example, suppose we have two tables, Table1 and Table2, and N cost functions. In the per-table mode, each table will have an overall cost and one for each cost function. {{Table1_Overall}} {{Table1_costFunction}} x N {{Table2_Overall}} {{Table2_costFunction}} x N In the ensemble mode, there will be only one overall and one set of function costs. {{ensemble_Overall}} {{ensemble_costFunction}} x N Can we use a special name for the combined table, e.g. ensemble? The problem is that the user may have already created a table named ensemble, which may cause confusion. Any idea on this problem? Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-v6.patch Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: (was: HBASE-13965-v6.patch) Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618997#comment-14618997 ] Lei Chen commented on HBASE-13965: -- Yes, sounds good, since the full list of cost functions should be known to the user. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619554#comment-14619554 ] Lei Chen commented on HBASE-13965: -- Thanks [~clayb] for giving suggestion. I have found that the stochastic load balancer holds a reference to HMaster, which can be used to get the number of tables, therefore the size of the map can be determined. No need to use configurable value. I will update the patch soon. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617922#comment-14617922 ] Lei Chen commented on HBASE-13965: -- Thanks for testing the patch and posting the result metrics. I agree that using percentage is easier for quick look. I will update the patch. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-v6.patch I'm having difficulties creating a request on reviewboard. When I'm uploading a patch file generated by git diff --no-prefix master, always get No valid separator after the filename was found in the diff header error. Working on it. Temporarily still uploading patch file. Updates: (trivial changes from v5 to v6) 1. rename some variables with more accurate names 2. use percent for each cost function TODO: 1. Make hard-coded map size configurable? Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615518#comment-14615518 ] Lei Chen commented on HBASE-13965: -- Thanks for your review and great feedback. I will update an updated patch. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-v4.patch Updates: 1. report - reports 2. costFunctionDesc added to JMX 3. Unnecessary table name length check is removed. 4. lastSubcosts - lastSubCosts 5. total += this.lastSubCosts[i]; TODO: 1. Make hard-coded map size configurable? Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615909#comment-14615909 ] Lei Chen commented on HBASE-13965: -- Good point. It can be more memory efficient if description is stored only once for each cost function. Patch will be updated. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-v5.patch Updates: 1. One copy of description is saved for each cost function, in a separate map TODO: 1. Make hard-coded map size configurable? Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-v3.patch Update: 1. The max size of the ever growing map is limited to 1000 (hard-coded) using a Most-Recent-Used (MRU) cache. 2. Checkstyle warnings fixed. TODO: 1. Make the hard-coded map size configurable? Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610601#comment-14610601 ] Lei Chen commented on HBASE-13965: -- I agree that the unused balancers should be purged or made into attributes of the stochastic load balancer. I think it may be better to do it in another Jira, since “One thing at a time”. About the ever growing map, I’m thinking of two ways to solve this problem. 1. Besides updateStochasticCost, add another method (or add a boolean parameter) which should be called when the table is deleted. This will allow the map to contain only existing tables 2. Use a fixed-size most recent used (MRU) cache to store the map. The size can be configurable. Any suggestion? Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965_v2.patch Changes: 1. License added for new classes. 2. Javadoc updated. 3. Several commits squashed into one. 4. Use != null, not null != TODO: 1. The ever growing map in MetricsStochasticBalancerSourceImpl.java Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609412#comment-14609412 ] Lei Chen commented on HBASE-13965: -- Thanks for the feedback. I appreciate your detailed review and help. I will modify the patch. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBase-13965-v1.patch Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: stochasticloadbalancerclasses_v2.png Before and after the patch. Other balancers will works the same way as before. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606864#comment-14606864 ] Lei Chen commented on HBASE-13965: -- Sorry for the delay. I propose the following changes which can be seen illustrated in the attached class diagram. The {{StochasticLoadBalancer}} extends {{BaseLoadBalancer}} which has a class variable of type {{MetricsBalancer}}. The {{MetricsBalancer}} contains a private class variable of type {{MetricsBalancerSource}}. The {{MetricsBalancerSource}} is an interface which defines which metrics can be reported to JMX. This proves to make extension difficult for load balancer implementation specific metrics (e.g. the {{StochasticLoadBalancer}}). Adding metrics to the generic interface is not appropriate being it is used by all load balancers and should not contain any load balancer specific metrics. I propose to create a class extending {{MetricsBalancer}} to provide specific load balancer metrics. To use this class, I propose to add a constructor to {{BaseLoadBalancer}} which allows for the balancer instance metrics class to be passed in. (Thanks [~enis] for code review and giving the constructor suggestion!) In the constructor of {{StochasticLoadBalancer}}, an instance of {{MetricsStochasticBalancer}} is created and passed to a new constructor added to {{BaseLoadBalancer}}, which will use it to replace the default {{MetricsBalancer}}. The function used to add metrics is declared as following: {code} public void updateStochasticCost(String tableName, String costFunctionName, String costFunctionDesc, Double value); {code} In {{MetricsBalancer}}, the {{private final}} class variable {{source}} was previously hardcoded and instantiated in its constructor; I propose a new function {{initSource}} which can be overridden to set this variable. As such, in the subclass {{MetricsStochasticBalancer}}, {{initSource}} will create a {{MetricsStochasticBalancerSource}} instance instead of the default {{MetricsBalancerSource}}. Finally, to give good insight to the internal status of {{StochasticLoadBalancer}}, we are considering adding metrics for each cost function, as well as the overall cost. For example, if the balancing is carried out for table MyTable1 and the {{StochasticLoadBalancer}} has 3 cost functions MoveCost, LocalityCost, and RegionReplicaHostCost, then 4 metrics will be added to “HBase - Master - Balancer” as following: MyTable1_Overall MyTable1_MoveCost MyTable1_LocalityCost MyTable1_RegionReplicaHostCost I'm building the patch, any suggestion is appreciated. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen reassigned HBASE-13965: Assignee: Lei Chen Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13965) Stochastic Load Balancer JMX Metrics
Lei Chen created HBASE-13965: Summary: Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)